Ron Kohavi

Los Altos, California, United States Contact Info

Sign in to view Ron’s full profile

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

32K followers 500+ connections

View mutual connections with Ron

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Ron Kohavi

Stanford University

About

Resume: http://www.kohavi.com/RonnyKResume.pdf

Consulting and teaching…

Articles by Ron

The QA Tradeoff in A/B Testing

The QA Tradeoff in A/B Testing

By Ron Kohavi

Feb 15, 2024
Should you suggest or enforce a template for hypotheses in A/B tests?

Should you suggest or enforce a template for hypotheses in A/B tests?

By Ron Kohavi

Feb 6, 2024
When should you use quasi-experiments instead of controlled experiments, or A/B tests? The barometer question analogy

When should you use quasi-experiments instead of controlled experiments, or A/B tests? The barometer question analogy

By Ron Kohavi

Jan 20, 2024

See all articles

Activity

Seen this? Love how the Kameleoon list of thought leaders in experimentation is helping surface other should-be-seen lists. This list of…

Seen this? Love how the Kameleoon list of thought leaders in experimentation is helping surface other should-be-seen lists. This list of…

Liked by Ron Kohavi
𝗖𝗥𝗢 𝗡𝗲𝘄𝘀 | 𝗪𝗲𝗲𝗸𝗹𝘆 𝗥𝗼𝘂𝗻𝗱𝘂𝗽 | 𝗝𝘂𝗹𝘆, 𝟮𝟮𝗻𝗱 The best of #ABtesting and #optimization on #LinkedIn last week 👇 💬 It's…

𝗖𝗥𝗢 𝗡𝗲𝘄𝘀 | 𝗪𝗲𝗲𝗸𝗹𝘆 𝗥𝗼𝘂𝗻𝗱𝘂𝗽 | 𝗝𝘂𝗹𝘆, 𝟮𝟮𝗻𝗱 The best of #ABtesting and #optimization on #LinkedIn last week 👇 💬 It's…

Liked by Ron Kohavi
❗ FACT: The most problematic form field is "Password" (we have Zuko Analytics data to back this up) ❗ As part of a series on optimizing the #ux of…

❗ FACT: The most problematic form field is "Password" (we have Zuko Analytics data to back this up) ❗ As part of a series on optimizing the #ux of…

Liked by Ron Kohavi

Join now to see all activity

Experience & Education

Ron Kohavi

*-******** ****

******* ******* ********
*****

**********: ************ ********** **** */* *******
******** **********

**.*. ******** ******* / ******* ******** ****** **.*. ********** **** **** ***********

1991 - 1995
********, ******

** ******** ******* ***** *** *****

1988 - 1991

View Ron’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Accelerating Innovation with AB Testing

Maven

Issued May 2023

Credential ID MtNvfcxy

See credential

Publications

Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology

The American Statistician September 8, 2023
The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet’s Google, LinkedIn, Lyft, Meta’s Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their…

The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet’s Google, LinkedIn, Lyft, Meta’s Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this article we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians’ awareness of these new research opportunities to increase collaboration between academia and the online industry.

Other authors
See publication
Online Controlled Experiments and A/B Tests

Springer, New York, NY March 11, 2023
Many good resources are available with motivation and explanations about online controlled experiments (Kohavi et al. 2009a, 2020; Thomke 2020; Luca and Bazerman 2020; Georgiev 2018, 2019; Kohavi and Thomke 2017; Siroker and Koomen 2013; Goward 2012; Schrage 2014; King et al. 2017; McFarland 2012; Manzi 2012; Tang et al. 2010). For organizations running online controlled experiments at scale, Gupta et al. (2019) provide an advanced set of challenges. We provide a motivating visual example of a…

Many good resources are available with motivation and explanations about online controlled experiments (Kohavi et al. 2009a, 2020; Thomke 2020; Luca and Bazerman 2020; Georgiev 2018, 2019; Kohavi and Thomke 2017; Siroker and Koomen 2013; Goward 2012; Schrage 2014; King et al. 2017; McFarland 2012; Manzi 2012; Tang et al. 2010). For organizations running online controlled experiments at scale, Gupta et al. (2019) provide an advanced set of challenges. We provide a motivating visual example of a controlled experiment that ran at Microsoft’s Bing. The team wanted to add a feature allowing advertisers to provide links to the target site. The rationale is that this will improve ads quality by giving users more information about what the advertiser’s site provides and allow users to directly navigate to the sub-category matching their intent. Visuals of the existing ads layout (Control) and the new ads layout (Treatment) with site links added are shown in Fig. 1.

Other authors
See publication
A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22) June 9, 2022
A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly intuitive concepts being touted, including by A/B tool vendors and agencies, which are misleading, often badly so. Our goal is to describe these misunderstandings, the “intuition” behind them, and to explain and bust that intuition with solid…

A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly intuitive concepts being touted, including by A/B tool vendors and agencies, which are misleading, often badly so. Our goal is to describe these misunderstandings, the “intuition” behind them, and to explain and bust that intuition with solid statistical reasoning. We provide recommendations that experimentation platform designers can implement to make it harder for experimenters to make these intuitive mistakes.

Other authors
See publication
Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing

Cambridge University Press April 2, 2020
Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests.

Other authors
See publication
Online randomized controlled experiments at scale: lessons and extensions to medicine

Trials 21, 150 (2020) February 7, 2020
Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in…

Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in technology, the marginal cost of such experiments is approaching zero and the value for data-driven decision-making is broadly recognized.

Other authors
See publication
Top Challenges from the first Practical Online Controlled Experiments Summit

SIGKDD Explorations May 22, 2019
Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale and encourage further academic and industrial exploration. To understand the top practical challenges in running OCEs at scale, representatives with experience in large-scale experimentation from thirteen different…

Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale and encourage further academic and industrial exploration. To understand the top practical challenges in running OCEs at scale, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit. All thirteen organizations sent representatives. Together these organizations tested more than one hundred thousand experiment treatments last year. Thirty-four experts from these organizations participated in the summit in Sunnyvale, CA, USA on December 13-14, 2018.

While there are papers from individual organizations on some of the challenges and pitfalls in running OCEs at scale, this is the first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions.

[LinkedIn limits authors to 10 and I can't even post them here because I go over the description limit size]

Other authors
See publication
The Surprising Power of Online Experiments

Harvard Business Review September 1, 2017
Today, Microsoft and several other leading companies conduct more than 10,000 online controlled experiments annually, with many tests engaging millions of users. These organizations have discovered that an “experiment with everything” approach has surprisingly large payoffs.

At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure. If a company develops the software infrastructure and organizational skills to conduct…

Today, Microsoft and several other leading companies conduct more than 10,000 online controlled experiments annually, with many tests engaging millions of users. These organizations have discovered that an “experiment with everything” approach has surprisingly large payoffs.

At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure. If a company develops the software infrastructure and organizational skills to conduct them, it will be able to assess not only ideas for websites but also potential business models, strategies, products, services, and marketing campaigns—all relatively inexpensively. Controlled experiments can transform decision making into a scientific, evidence-driven process—rather than an intuitive reaction. Without them, many breakthroughs might never happen, and many bad ideas would be implemented, only to fail, wasting resources.

Other authors
See publication
Pitfalls of Long-Term Online Controlled Experiments

IEEE Big Data 2016 December 5, 2016
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features.

One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different…

Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features.

One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different variants. The difficulty is that short-term changes to metrics may not predict the long-term impact of a change. For example, raising prices likely increases short-term revenue but also likely reduces long-term revenue (customer lifetime value) as users abandon. Degrading search results in a Search Engine causes users to search more, thus increasing query share short-term, but increasing abandonment and thus reducing long-term customer lifetime value. Ideally, an OEC is based on metrics in a short-term experiment that are good predictors of long-term value.

To assess long-term impact, one approach is to run long-term controlled experiments and assume that long-term effects are represented by observed metrics. In this paper we share several examples of long-term experiments and the pitfalls associated with running them. We discuss cookie stability, survivorship bias, selection bias, and perceived trends, and share methodologies that can be used to partially address some of these issues.

While there is clearly value in evaluating long-term trends, experimenters running long-term experiments must be cautious, as results may be due to the above pitfalls more than the true delta between the Treatment and Control. We hope our real examples and analyses will sensitize readers to the issues and encourage the development of new methodologies for this important problem.

Other authors
See publication
Pitfalls in Online Controlled Experiments

MIT CODE: Conference On Digital Experimentation October 15, 2016

It's easy to run a controlled experiment and compute a p-value with five digits after the decimal point. While getting such precise numbers is easy, getting numbers you can trust is much harder. We share practical pitfalls from online controlled experiments across multiple groups at Microsoft.

See publication
Challenging Problems in Online Controlled Experiments

The Conference on Digital Experimentation @ MIT (CODE 2015) October 17, 2015

Online controlled experiments are now widely run in the software industry. I share several challenging problems and motivate their importance. These include high-variance metrics, issues with p-values, metric-driven vs. design-driven decisions, novelty effects, and leaks

See publication
Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 years

KDD 2015 Invited Keynote August 11, 2015

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using trustworthy controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend recommendation systems and relevance algorithms, from search engines (e.g., Google, Microsoft’s Bing, Yahoo) to retailers (e.g., Amazon, eBay, Netflix, Etsy) to…

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using trustworthy controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend recommendation systems and relevance algorithms, from search engines (e.g., Google, Microsoft’s Bing, Yahoo) to retailers (e.g., Amazon, eBay, Netflix, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to Travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale (e.g., hundreds of experiments run every day at Bing) and deployment of online controlled experiments across dozens of web sites and applications has taught us many lessons. We provide an introduction, share real examples, key lessons, and cultural challenges.

See publication
Seven Rules of Thumb for Web Site Experimenters

KDD 2014 May 15, 2014
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their…

Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known.

To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters,” but we describe the assumptions in the experimental design and share additional experiments that improved our understanding of where speed matters more: certain areas of the web page are more critical.

This paper serves two goals. First, it can guide experimenters with rules of thumb that can help them optimize their sites. Second, it provides the KDD community with new research challenges on the applicability, exceptions, and extensions to these, one of the goals for KDD’s industrial track.

Other authors
See publication
Online Controlled Experiments at Large Scale

KDD 2013 June 20, 2013
Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas:…

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are millions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up-front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders’ early excitement, saving us similar large amounts

Other authors
See publication
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data

WSDM 2013: The Sixth ACM International Conference on Web Search and Data Mining October 26, 2012
Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small diﬀerences in key metrics, on the order of fractions of a percent, may have very signiﬁcant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments…

Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small diﬀerences in key metrics, on the order of fractions of a percent, may have very signiﬁcant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments being run annually, improving the sensitivity of experiments allows for more precise assessment of value, or equivalently running the experiments on smaller populations (supporting more experiments) or for shorter durations (improving the feedback cycle and agility). We propose an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity. This technique is applicable to a wide variety of key business metrics, and it is practical and easy to implement. The results on Bing’s experimentation system are very successful: we can reduce variance by about 50%, eﬀectively achieving the same statistical power with only half of the users, or half the duration.

Other authors
See publication
Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics

Sixth ACM Conference on Recommender Systems September 12, 2012

The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend recommendation systems and relevance algorithms, online controlled experiments are now utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled…

The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend recommendation systems and relevance algorithms, online controlled experiments are now utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale-thousands of experiments now-has taught us many lessons. We provide an introduction, share real examples, key learnings, cultural challenges, and humbling statistics.

See publication
Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained

KDD 2012 July 12, 2012
Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale—thousands of experiments now—has taught us many lessons. These exemplify the…

Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale—thousands of experiments now—has taught us many lessons. These exemplify the proverb that the difference between theory and practice is greater in practice than in theory. We present our learnings as they happened: puzzling outcomes of controlled experiments that we analyzed deeply to understand and explain. Each of these took multiple-person weeks to months to properly analyze and get to the often surprising root cause. The root causes behind these puzzling results are not isolated incidents; these issues generalized to multiple experiments. The heightened awareness should help readers increase the trustworthiness of the results coming out of controlled experiments. At Microsoft’s Bing, it is not uncommon to see experiments that impact annual revenue by millions of dollars, thus getting trustworthy results is critical and investing in understanding anomalies has tremendous payoff: reversing a single incorrect decision based on the results of an experiment can fund a whole team of analysts. The topics we cover include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects.

Other authors
See publication
Online Experiments: Practical Lessons

IEEE Computer, Vol 43, issue 9, pp. 82-85 Sep 2010
When running online experiments, getting numbers is easy;
getting numbers you can trust is hard.

Other authors
See publication
Controlled Experiments on the Web: Survey and Practical Guide

Data Mining and Knowledge Discovery journal, Vol 18(1), p. 140-181 2009
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where…

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where endusers can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques,which we showare not as simple in practice as is often assumed. Controlledexperiments typically generate large amounts of data, which can be analyzed using datamining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled
experiments.

Other authors
See publication
Online Experimentation at Microsoft

Microsoft ThinkWeek Paper, recognized as top 30 2009
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Through randomization and proper design, experiments allow establishing causality scientifically, which is why they are the gold standard in drug tests. In software development, multiple techniques are used to define product requirements; controlled experiments provide a valuable way to assess the impact of…

Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Through randomization and proper design, experiments allow establishing causality scientifically, which is why they are the gold standard in drug tests. In software development, multiple techniques are used to define product requirements; controlled experiments provide a valuable way to assess the impact of new features on customer behavior. At Microsoft, we have built the capability for running controlled experiments on web sites and services, thus enabling a more scientific approach to evaluating ideas at different stages of the planning process. In our previous papers, we did not have good examples of controlled experiments at Microsoft; now we do! The humbling results we share bring to question whether a-priori prioritization is as good as most people believe it is. The Experimentation Platform (ExP) was built to accelerate innovation through trustworthy experimentation. Along the way, we had to tackle both technical and cultural challenges and we provided software developers, program managers, and designers the benefit of an unbiased ear to listen to their customers and make data-driven decisions. A technical survey of the literature on controlled experiments was recently published by us in a journal (Kohavi, Longbotham, Sommerfield, & Henne, 2009). The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.

Other authors
See publication
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning journal, Vol 36, Nos. 1/2, pages 105-139 1999
Methods for voting classication algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classiers for articial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms,which use perturbation…

Methods for voting classication algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classiers for articial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms,which use perturbation, reweighting, and combination techniques, affects classication error. We provide a bias and variance decompositionof the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstablemethods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backtting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backt. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and signicant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows.

Other authors
See publication
Wrappers for Feature Subset Selection

Artificial Intelligence journal (97) 1997
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method…

In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive Bayes.

Other authors
See publication
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

IJCAI 1995

We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to…

We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross-validation, we vary the number of folds and whether the folds are stratified or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds.

See publication
Supervised and Unsupervised Discretization of Continuous Features

Machine Learning 1995
Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were…

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive-Bayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm significantly improved if features were discretized in advance; in our experiments, the performance never significantly degraded, an interesting phenomenon considering the fact that C4.5 is capable of locally discretizing features.

Other authors
See publication

Patents

Changing results after back button use or duplicate request

Issued September 8, 2015 US 9,129,018
Enhancements of the user experience are provided when a user returns to a previously viewed page, such as a previously viewed page of search results. When a user returns to a previously viewed page, additional context information from a user's actions since the initial view of a page can be used to modify the previously viewed page and/or obtain a new version of the previously viewed page. In situations where the previously viewed page corresponds to a page of responsive results from a search…

Enhancements of the user experience are provided when a user returns to a previously viewed page, such as a previously viewed page of search results. When a user returns to a previously viewed page, additional context information from a user's actions since the initial view of a page can be used to modify the previously viewed page and/or obtain a new version of the previously viewed page. In situations where the previously viewed page corresponds to a page of responsive results from a search engine, the modified and/or new version of the search engine results page can include an expanded or reduced group of results, different types of results, different rankings for existing results, or a combination thereof.

Other inventors
See patent
Active hip

Issued April 30, 2013 US 8,433,916

Computing services that unwanted entities may wish to access for improper, and potentially illegal, use can be more effectively protected by using Active HIP systems and methodologies. An Active HIP involves dynamically swapping one random HIP challenge, e.g., but not limited to, image, for a second random HIP challenge, e.g., but not limited to, image. An Active HIP can also, or otherwise, involve stitching together, or otherwise collecting and including, within Active HIP software, i.e., a…

Computing services that unwanted entities may wish to access for improper, and potentially illegal, use can be more effectively protected by using Active HIP systems and methodologies. An Active HIP involves dynamically swapping one random HIP challenge, e.g., but not limited to, image, for a second random HIP challenge, e.g., but not limited to, image. An Active HIP can also, or otherwise, involve stitching together, or otherwise collecting and including, within Active HIP software, i.e., a HIP web page, to be executed by a computing device of a user seeking access to a HIP-protected computing service x number of software executables randomly selected from a pool of y number of software executables. The x number of software executables, when run, generates a random Active HIP key. If the generated Active HIP key accompanies a correct user response to the valid HIP challenge the system and/or methodology can assume with a degree of certainty that the current user is a legitimate human user and allow the current user access to the requested computing service.

See patent
Method and System for Determining Whether an Offering is Controversial Based on User Feedback

Issued April 2, 2013 US 8412557
The controversiality of an offering in a computer implemented system is computed based on user satisfaction feedback. A controversiality index can be provided to indicate the extent to which the offering is controversial.

Other inventors
See patent
Continuous usability trial for a website

Issued May 22, 2012 US 8,185,608
A continuous website trial allows ongoing observation of user interactions with website for an indefinite period of time that is not ascertainable at initiation of the trial. Users are randomly assigned to either a control group or one or more test groups. The control and test groups are served different sets of web pages, even though they access the same website. During the trial, the web pages for the control group are held constant over time, while the web pages for the test group(s) undergo…

A continuous website trial allows ongoing observation of user interactions with website for an indefinite period of time that is not ascertainable at initiation of the trial. Users are randomly assigned to either a control group or one or more test groups. The control and test groups are served different sets of web pages, even though they access the same website. During the trial, the web pages for the control group are held constant over time, while the web pages for the test group(s) undergo multiple modifications at separate occasions over time. As the web pages for the test group(s) are modified, statistical data collection continues to learn how user behavior changes as a result of the modifications. The statistical data obtained from the users of the various groups may be compared and contrasted and used to gain a better understanding of customer experience with the website.

Other inventors
See patent
Detection of behavior-based associations between search strings and items

Issued February 7, 2012 US 8,112,429
A system and method are disclosed for automatically detecting associations between particular sets of search criteria, such as particular search strings, and particular items. Actions of users of an interactive system, such as a web site, are monitored over time to generate event histories reflective of searches, item selection actions, and possibly other types of user actions. An analysis component collectively analyzes the event histories to automatically identify and quantify associations…

A system and method are disclosed for automatically detecting associations between particular sets of search criteria, such as particular search strings, and particular items. Actions of users of an interactive system, such as a web site, are monitored over time to generate event histories reflective of searches, item selection actions, and possibly other types of user actions. An analysis component collectively analyzes the event histories to automatically identify and quantify associations between specific search strings (or other types of search criteria) and specific items. As part of this process, a decay function reduces the weight given to a post-search item selection event based on intervening events that occur between the search event and the item selection event

Other inventors
See patent
Bayes rule based and decision tree hybrid classifier

US 6,026,399

The present invention provides a hybrid classifier, called the NB-Tree classifier, for classifying a set of records

See patent
Method system and computer program product for visualizing an evidence classifier

US 6,460,049
A method, system, and computer program product visualizes the structure of an evidence classifier.

Other inventors
See patent
Method, system, and computer program product for visualizing a decision-tree classifier

US 6,278,464
A method, system and a computer program product for visualizing a decision-tree classifier are provided

Other inventors
See patent
Strategies for providing diverse recommendations

US 7,542,951
Strategies are described for generating recommendations

Other inventors
See patent
Strategies for providing novel recommendations

US 7,584,159
Strategies are described for generating recommendations

Other inventors
See patent
System and method for selection of important attributes

US 6,026,399
A system and method determines how well various attributes in a record discriminate different values of a chosen label attribute.

Other inventors
See patent

Honors & Awards

59 A/B testing influencers you need to follow in 2023

Kameleoon

Jul 2023
60 influencers in A/B testing you need to follow in 2022

Kameleoon

Jun 2022

https://www.kameleoon.com/en/blog/60-influencers-ab-testing-you-need-follow-2022
82 influencers in A/B testing that you need to know in 2021

Kameleoon

Mar 2021

https://www.kameleoon.com/en/blog/top-ab-testing-influencers
Over 50,000 citations to papers

-

Mar 2021

https://scholar.google.com/citations?hl=en&user=O3RYHGwAAAAJ&view_op=list_works&pagesize=100
Experimentation lifetime achievement award

https://experimentationcultureawards.com/

Oct 2020

https://experimentationcultureawards.com/#ronnykohavi

https://www.linkedin.com/posts/ronnyk_expca2020-abtest-experimentguide-activity-6714985210947207168-7Hec
Quora Most Viewed Writer in A/B Testing (adjusts real-time, usually top 3)

Quora

2019

https://www.quora.com/topic/A-B-Testing/writers
AMiner 5th most influential scholar in AI, 26th most influential scholar in Machine Learning

-

2016

https://aminer.org/mostinfluentialscholar/ai
https://aminer.org/mostinfluentialscholar/ml
Forbes article: A Massive Social Experiment On You Is Under Way, And You Will Love It

Forbes

Feb 2015

Quoted in http://www.forbes.com/sites/parmyolson/2015/01/21/jawbone-guinea-pig-economy/
IEEE Tools with Artificial Intelligence best paper award

IEEE

1997

IEEE Tools With Artificial Intelligence Best Paper Award for the paper Data Mining using MLC++, a Machine Learning Library in C++ by Kohavi, Sommerfield, and Dougherty.
President's award (top 5%) each year of BA degree

Technion

1991

Languages

Hebrew

Native or bilingual proficiency
English

Native or bilingual proficiency

Organizations

SIGKDD

-

Recommendations received

22 people have recommended Ron

Join now to view

More activity by Ron

I cancelled my Loom membership last week. And I was shocked. Pleasantly shocked. Shocked at how easy it was to cancel. It took me 5 screens and 6…

I cancelled my Loom membership last week. And I was shocked. Pleasantly shocked. Shocked at how easy it was to cancel. It took me 5 screens and 6…

Liked by Ron Kohavi
I'm not big on the whole influencer thing but I'd like to thank Kameleoon for including me in the 2024 list with so many other kickass people. I'm…

I'm not big on the whole influencer thing but I'd like to thank Kameleoon for including me in the 2024 list with so many other kickass people. I'm…

Liked by Ron Kohavi
Happy Friday to everyone... but an ESPECAILLY happy Friday to those grinders out there who are being recognized for their work in helping shape /…

Happy Friday to everyone... but an ESPECAILLY happy Friday to those grinders out there who are being recognized for their work in helping shape /…

Liked by Ron Kohavi
𝗪𝗵𝗲𝗿𝗲 𝘁𝗼 𝗙𝗶𝗻𝗱 𝗜𝗱𝗲𝗮𝘀 𝗳𝗼𝗿 𝗔/𝗕 𝗧𝗲𝘀𝘁𝘀? I often encounter a problem where the team is stuck, not knowing where to get ideas for…

𝗪𝗵𝗲𝗿𝗲 𝘁𝗼 𝗙𝗶𝗻𝗱 𝗜𝗱𝗲𝗮𝘀 𝗳𝗼𝗿 𝗔/𝗕 𝗧𝗲𝘀𝘁𝘀? I often encounter a problem where the team is stuck, not knowing where to get ideas for…

Liked by Ron Kohavi
The two-minute video for my KDD 2024 paper on False Positives in A/B tests with Nanyu Chen was posted by ACM at https://lnkd.in/gNwnvhfE The paper…

The two-minute video for my KDD 2024 paper on False Positives in A/B tests with Nanyu Chen was posted by ACM at https://lnkd.in/gNwnvhfE The paper…

Shared by Ron Kohavi
A/B Testing myth: “Longer tests = more sample size = more power” The shared post gives a great formal intuition why “longer tests = more sample size…

A/B Testing myth: “Longer tests = more sample size = more power” The shared post gives a great formal intuition why “longer tests = more sample size…

Liked by Ron Kohavi

View Ron’s full profile

See who you know in common
Get introduced
Contact Ron directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ron Kohavi

2 others named Ron Kohavi are on LinkedIn

See others named Ron Kohavi

About

Articles by Ron

The QA Tradeoff in A/B Testing

By Ron Kohavi

Should you suggest or enforce a template for hypotheses in A/B tests?

By Ron Kohavi

When should you use quasi-experiments instead of controlled experiments, or A/B tests? The barometer question analogy

By Ron Kohavi

Activity

Seen this? Love how the Kameleoon list of thought leaders in experimentation is helping surface other should-be-seen lists. This list of…

Liked by Ron Kohavi

𝗖𝗥𝗢 𝗡𝗲𝘄𝘀 | 𝗪𝗲𝗲𝗸𝗹𝘆 𝗥𝗼𝘂𝗻𝗱𝘂𝗽 | 𝗝𝘂𝗹𝘆, 𝟮𝟮𝗻𝗱 The best of #ABtesting and #optimization on #LinkedIn last week 👇 💬 It's…

Liked by Ron Kohavi

❗ FACT: The most problematic form field is "Password" (we have Zuko Analytics data to back this up) ❗ As part of a series on optimizing the #ux of…

Liked by Ron Kohavi

Experience & Education

Ron Kohavi

**********, **********

View Ron’s full experience

See their title, tenure and more.

Licenses & Certifications

Publications

The American Statistician September 8, 2023

Springer, New York, NY March 11, 2023

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22) June 9, 2022

Cambridge University Press April 2, 2020

Trials 21, 150 (2020) February 7, 2020

SIGKDD Explorations May 22, 2019

Harvard Business Review September 1, 2017

IEEE Big Data 2016 December 5, 2016

MIT CODE: Conference On Digital Experimentation October 15, 2016

The Conference on Digital Experimentation @ MIT (CODE 2015) October 17, 2015

KDD 2015 Invited Keynote August 11, 2015

KDD 2014 May 15, 2014

KDD 2013 June 20, 2013

WSDM 2013: The Sixth ACM International Conference on Web Search and Data Mining October 26, 2012

Sixth ACM Conference on Recommender Systems September 12, 2012

KDD 2012 July 12, 2012

IEEE Computer, Vol 43, issue 9, pp. 82-85 Sep 2010

Data Mining and Knowledge Discovery journal, Vol 18(1), p. 140-181 2009

Microsoft ThinkWeek Paper, recognized as top 30 2009

Machine Learning journal, Vol 36, Nos. 1/2, pages 105-139 1999

Artificial Intelligence journal (97) 1997

IJCAI 1995

Machine Learning 1995

Patents

Issued September 8, 2015 US 9,129,018

Issued April 30, 2013 US 8,433,916

Issued April 2, 2013 US 8412557

Issued May 22, 2012 US 8,185,608

Issued February 7, 2012 US 8,112,429

US 6,026,399

US 6,460,049

US 6,278,464

US 7,542,951

US 7,584,159

US 6,026,399

Honors & Awards

59 A/B testing influencers you need to follow in 2023

Kameleoon

60 influencers in A/B testing you need to follow in 2022

Kameleoon

82 influencers in A/B testing that you need to know in 2021

Kameleoon

Over 50,000 citations to papers

-

Experimentation lifetime achievement award

https://experimentationcultureawards.com/

Quora Most Viewed Writer in A/B Testing (adjusts real-time, usually top 3)

Quora

AMiner 5th most influential scholar in AI, 26th most influential scholar in Machine Learning

-

Forbes article: A Massive Social Experiment On You Is Under Way, And You Will Love It

Forbes

IEEE Tools with Artificial Intelligence best paper award

IEEE

President's award (top 5%) each year of BA degree

Technion

Languages