Ron Kohavi

Los Altos, California, United States Contact Info
32K followers 500+ connections

Join to view profile

About

Resume: http://www.kohavi.com/RonnyKResume.pdf

Consulting and teaching…

Articles by Ron

See all articles

Activity

Join now to see all activity

Experience & Education

  • Ron Kohavi

View Ron’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Publications

  • Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology

    The American Statistician

    The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet’s Google, LinkedIn, Lyft, Meta’s Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their…

    The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet’s Google, LinkedIn, Lyft, Meta’s Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this article we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians’ awareness of these new research opportunities to increase collaboration between academia and the online industry.

    Other authors
    See publication
  • Online Controlled Experiments and A/B Tests

    Springer, New York, NY

    Many good resources are available with motivation and explanations about online controlled experiments (Kohavi et al. 2009a, 2020; Thomke 2020; Luca and Bazerman 2020; Georgiev 2018, 2019; Kohavi and Thomke 2017; Siroker and Koomen 2013; Goward 2012; Schrage 2014; King et al. 2017; McFarland 2012; Manzi 2012; Tang et al. 2010). For organizations running online controlled experiments at scale, Gupta et al. (2019) provide an advanced set of challenges. We provide a motivating visual example of a…

    Many good resources are available with motivation and explanations about online controlled experiments (Kohavi et al. 2009a, 2020; Thomke 2020; Luca and Bazerman 2020; Georgiev 2018, 2019; Kohavi and Thomke 2017; Siroker and Koomen 2013; Goward 2012; Schrage 2014; King et al. 2017; McFarland 2012; Manzi 2012; Tang et al. 2010). For organizations running online controlled experiments at scale, Gupta et al. (2019) provide an advanced set of challenges. We provide a motivating visual example of a controlled experiment that ran at Microsoft’s Bing. The team wanted to add a feature allowing advertisers to provide links to the target site. The rationale is that this will improve ads quality by giving users more information about what the advertiser’s site provides and allow users to directly navigate to the sub-category matching their intent. Visuals of the existing ads layout (Control) and the new ads layout (Treatment) with site links added are shown in Fig. 1.

    Other authors
    See publication
  • A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments

    Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22)

    A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly intuitive concepts being touted, including by A/B tool vendors and agencies, which are misleading, often badly so. Our goal is to describe these misunderstandings, the “intuition” behind them, and to explain and bust that intuition with solid…

    A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly intuitive concepts being touted, including by A/B tool vendors and agencies, which are misleading, often badly so. Our goal is to describe these misunderstandings, the “intuition” behind them, and to explain and bust that intuition with solid statistical reasoning. We provide recommendations that experimentation platform designers can implement to make it harder for experimenters to make these intuitive mistakes.

    Other authors
    See publication
  • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing

    Cambridge University Press

    Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests.

    Other authors
    See publication
  • Online randomized controlled experiments at scale: lessons and extensions to medicine

    Trials 21, 150 (2020)

    Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in…

    Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in technology, the marginal cost of such experiments is approaching zero and the value for data-driven decision-making is broadly recognized.

    Other authors
    See publication
  • Top Challenges from the first Practical Online Controlled Experiments Summit

    SIGKDD Explorations

    Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale and encourage further academic and industrial exploration. To understand the top practical challenges in running OCEs at scale, representatives with experience in large-scale experimentation from thirteen different…

    Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale and encourage further academic and industrial exploration. To understand the top practical challenges in running OCEs at scale, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit. All thirteen organizations sent representatives. Together these organizations tested more than one hundred thousand experiment treatments last year. Thirty-four experts from these organizations participated in the summit in Sunnyvale, CA, USA on December 13-14, 2018.

    While there are papers from individual organizations on some of the challenges and pitfalls in running OCEs at scale, this is the first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions.


    [LinkedIn limits authors to 10 and I can't even post them here because I go over the description limit size]

    Other authors
    See publication
  • The Surprising Power of Online Experiments

    Harvard Business Review

    Today, Microsoft and several other leading companies conduct more than 10,000 online controlled experiments annually, with many tests engaging millions of users. These organizations have discovered that an “experiment with everything” approach has surprisingly large payoffs.

    At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure. If a company develops the software infrastructure and organizational skills to conduct…

    Today, Microsoft and several other leading companies conduct more than 10,000 online controlled experiments annually, with many tests engaging millions of users. These organizations have discovered that an “experiment with everything” approach has surprisingly large payoffs.

    At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure. If a company develops the software infrastructure and organizational skills to conduct them, it will be able to assess not only ideas for websites but also potential business models, strategies, products, services, and marketing campaigns—all relatively inexpensively. Controlled experiments can transform decision making into a scientific, evidence-driven process—rather than an intuitive reaction. Without them, many breakthroughs might never happen, and many bad ideas would be implemented, only to fail, wasting resources.

    Other authors
    See publication
  • Pitfalls of Long-Term Online Controlled Experiments

    IEEE Big Data 2016

    Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features.

    One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different…

    Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features.

    One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different variants. The difficulty is that short-term changes to metrics may not predict the long-term impact of a change. For example, raising prices likely increases short-term revenue but also likely reduces long-term revenue (customer lifetime value) as users abandon. Degrading search results in a Search Engine causes users to search more, thus increasing query share short-term, but increasing abandonment and thus reducing long-term customer lifetime value. Ideally, an OEC is based on metrics in a short-term experiment that are good predictors of long-term value.

    To assess long-term impact, one approach is to run long-term controlled experiments and assume that long-term effects are represented by observed metrics. In this paper we share several examples of long-term experiments and the pitfalls associated with running them. We discuss cookie stability, survivorship bias, selection bias, and perceived trends, and share methodologies that can be used to partially address some of these issues.

    While there is clearly value in evaluating long-term trends, experimenters running long-term experiments must be cautious, as results may be due to the above pitfalls more than the true delta between the Treatment and Control. We hope our real examples and analyses will sensitize readers to the issues and encourage the development of new methodologies for this important problem.

    Other authors
    See publication
  • Pitfalls in Online Controlled Experiments

    MIT CODE: Conference On Digital Experimentation

    It's easy to run a controlled experiment and compute a p-value with five digits after the decimal point.  While getting such precise numbers is easy, getting numbers you can trust is much harder. We share practical pitfalls from online controlled experiments across multiple groups at Microsoft.

    See publication
  • Challenging Problems in Online Controlled Experiments

    The Conference on Digital Experimentation @ MIT (CODE 2015)

    Online controlled experiments are now widely run in the software industry.  I share several challenging problems and motivate their importance.  These include high-variance metrics, issues with p-values, metric-driven vs. design-driven decisions, novelty effects, and leaks

    See publication
  • Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 years

    KDD 2015 Invited Keynote

    The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using trustworthy controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend recommendation systems and relevance algorithms, from search engines (e.g., Google, Microsoft’s Bing, Yahoo) to retailers (e.g., Amazon, eBay, Netflix, Etsy) to…

    The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using trustworthy controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend recommendation systems and relevance algorithms, from search engines (e.g., Google, Microsoft’s Bing, Yahoo) to retailers (e.g., Amazon, eBay, Netflix, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to Travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale (e.g., hundreds of experiments run every day at Bing) and deployment of online controlled experiments across dozens of web sites and applications has taught us many lessons. We provide an introduction, share real examples, key lessons, and cultural challenges.

    See publication
  • Seven Rules of Thumb for Web Site Experimenters

    KDD 2014

    Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their…

    Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known.

    To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters,” but we describe the assumptions in the experimental design and share additional experiments that improved our understanding of where speed matters more: certain areas of the web page are more critical.

    This paper serves two goals. First, it can guide experimenters with rules of thumb that can help them optimize their sites. Second, it provides the KDD community with new research challenges on the applicability, exceptions, and extensions to these, one of the goals for KDD’s industrial track.

    Other authors
    See publication
  • Online Controlled Experiments at Large Scale

    KDD 2013

    Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas:…

    Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are millions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up-front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders’ early excitement, saving us similar large amounts

    Other authors
    See publication
  • Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data

    WSDM 2013: The Sixth ACM International Conference on Web Search and Data Mining

    Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small differences in key metrics, on the order of fractions of a percent, may have very significant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments…

    Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small differences in key metrics, on the order of fractions of a percent, may have very significant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments being run annually, improving the sensitivity of experiments allows for more precise assessment of value, or equivalently running the experiments on smaller populations (supporting more experiments) or for shorter durations (improving the feedback cycle and agility). We propose an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity. This technique is applicable to a wide variety of key business metrics, and it is practical and easy to implement. The results on Bing’s experimentation system are very successful: we can reduce variance by about 50%, effectively achieving the same statistical power with only half of the users, or half the duration.

    Other authors
    See publication
  • Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics

    Sixth ACM Conference on Recommender Systems

    The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend recommendation systems and relevance algorithms, online controlled experiments are now utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled…

    The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend recommendation systems and relevance algorithms, online controlled experiments are now utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale-thousands of experiments now-has taught us many lessons. We provide an introduction, share real examples, key learnings, cultural challenges, and humbling statistics.

    See publication
  • Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained

    KDD 2012

    Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale—thousands of experiments now—has taught us many lessons. These exemplify the…

    Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale—thousands of experiments now—has taught us many lessons. These exemplify the proverb that the difference between theory and practice is greater in practice than in theory. We present our learnings as they happened: puzzling outcomes of controlled experiments that we analyzed deeply to understand and explain. Each of these took multiple-person weeks to months to properly analyze and get to the often surprising root cause. The root causes behind these puzzling results are not isolated incidents; these issues generalized to multiple experiments. The heightened awareness should help readers increase the trustworthiness of the results coming out of controlled experiments. At Microsoft’s Bing, it is not uncommon to see experiments that impact annual revenue by millions of dollars, thus getting trustworthy results is critical and investing in understanding anomalies has tremendous payoff: reversing a single incorrect decision based on the results of an experiment can fund a whole team of analysts. The topics we cover include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects.

    Other authors
    See publication
  • Online Experiments: Practical Lessons

    IEEE Computer, Vol 43, issue 9, pp. 82-85

    When running online experiments, getting numbers is easy;
    getting numbers you can trust is hard.

    Other authors
    See publication
  • Controlled Experiments on the Web: Survey and Practical Guide

    Data Mining and Knowledge Discovery journal, Vol 18(1), p. 140-181

    The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where…

    The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where endusers can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques,which we showare not as simple in practice as is often assumed. Controlledexperiments typically generate large amounts of data, which can be analyzed using datamining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled
    experiments.

    Other authors
    See publication
  • Online Experimentation at Microsoft

    Microsoft ThinkWeek Paper, recognized as top 30

    Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Through randomization and proper design, experiments allow establishing causality scientifically, which is why they are the gold standard in drug tests. In software development, multiple techniques are used to define product requirements; controlled experiments provide a valuable way to assess the impact of…

    Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Through randomization and proper design, experiments allow establishing causality scientifically, which is why they are the gold standard in drug tests. In software development, multiple techniques are used to define product requirements; controlled experiments provide a valuable way to assess the impact of new features on customer behavior. At Microsoft, we have built the capability for running controlled experiments on web sites and services, thus enabling a more scientific approach to evaluating ideas at different stages of the planning process. In our previous papers, we did not have good examples of controlled experiments at Microsoft; now we do! The humbling results we share bring to question whether a-priori prioritization is as good as most people believe it is. The Experimentation Platform (ExP) was built to accelerate innovation through trustworthy experimentation. Along the way, we had to tackle both technical and cultural challenges and we provided software developers, program managers, and designers the benefit of an unbiased ear to listen to their customers and make data-driven decisions. A technical survey of the literature on controlled experiments was recently published by us in a journal (Kohavi, Longbotham, Sommerfield, & Henne, 2009). The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.

    Other authors
    See publication
  • An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

    Machine Learning journal, Vol 36, Nos. 1/2, pages 105-139

    Methods for voting classication algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classiers for articial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms,which use perturbation…

    Methods for voting classication algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classiers for articial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms,which use perturbation, reweighting, and combination techniques, affects classication error. We provide a bias and variance decompositionof the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstablemethods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backtting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backt. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and signicant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows.

    Other authors
    See publication
  • Wrappers for Feature Subset Selection

    Artificial Intelligence journal (97)

    In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method…

    In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive Bayes.

    Other authors
    See publication
  • A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

    IJCAI

    We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to…

    We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross-validation, we vary the number of folds and whether the folds are stratified or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds.

    See publication
  • Supervised and Unsupervised Discretization of Continuous Features

    Machine Learning

    Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were…

    Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive-Bayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm significantly improved if features were discretized in advance; in our experiments, the performance never significantly degraded, an interesting phenomenon considering the fact that C4.5 is capable of locally discretizing features.

    Other authors
    See publication

Patents

  • Changing results after back button use or duplicate request

    Issued US 9,129,018

    Enhancements of the user experience are provided when a user returns to a previously viewed page, such as a previously viewed page of search results. When a user returns to a previously viewed page, additional context information from a user's actions since the initial view of a page can be used to modify the previously viewed page and/or obtain a new version of the previously viewed page. In situations where the previously viewed page corresponds to a page of responsive results from a search…

    Enhancements of the user experience are provided when a user returns to a previously viewed page, such as a previously viewed page of search results. When a user returns to a previously viewed page, additional context information from a user's actions since the initial view of a page can be used to modify the previously viewed page and/or obtain a new version of the previously viewed page. In situations where the previously viewed page corresponds to a page of responsive results from a search engine, the modified and/or new version of the search engine results page can include an expanded or reduced group of results, different types of results, different rankings for existing results, or a combination thereof.

    Other inventors
    See patent
  • Active hip

    Issued US 8,433,916

    Computing services that unwanted entities may wish to access for improper, and potentially illegal, use can be more effectively protected by using Active HIP systems and methodologies. An Active HIP involves dynamically swapping one random HIP challenge, e.g., but not limited to, image, for a second random HIP challenge, e.g., but not limited to, image. An Active HIP can also, or otherwise, involve stitching together, or otherwise collecting and including, within Active HIP software, i.e., a…

    Computing services that unwanted entities may wish to access for improper, and potentially illegal, use can be more effectively protected by using Active HIP systems and methodologies. An Active HIP involves dynamically swapping one random HIP challenge, e.g., but not limited to, image, for a second random HIP challenge, e.g., but not limited to, image. An Active HIP can also, or otherwise, involve stitching together, or otherwise collecting and including, within Active HIP software, i.e., a HIP web page, to be executed by a computing device of a user seeking access to a HIP-protected computing service x number of software executables randomly selected from a pool of y number of software executables. The x number of software executables, when run, generates a random Active HIP key. If the generated Active HIP key accompanies a correct user response to the valid HIP challenge the system and/or methodology can assume with a degree of certainty that the current user is a legitimate human user and allow the current user access to the requested computing service.

    See patent
  • Method and System for Determining Whether an Offering is Controversial Based on User Feedback

    Issued US 8412557

    The controversiality of an offering in a computer implemented system is computed based on user satisfaction feedback. A controversiality index can be provided to indicate the extent to which the offering is controversial.

    Other inventors
    See patent
  • Continuous usability trial for a website

    Issued US 8,185,608

    A continuous website trial allows ongoing observation of user interactions with website for an indefinite period of time that is not ascertainable at initiation of the trial. Users are randomly assigned to either a control group or one or more test groups. The control and test groups are served different sets of web pages, even though they access the same website. During the trial, the web pages for the control group are held constant over time, while the web pages for the test group(s) undergo…

    A continuous website trial allows ongoing observation of user interactions with website for an indefinite period of time that is not ascertainable at initiation of the trial. Users are randomly assigned to either a control group or one or more test groups. The control and test groups are served different sets of web pages, even though they access the same website. During the trial, the web pages for the control group are held constant over time, while the web pages for the test group(s) undergo multiple modifications at separate occasions over time. As the web pages for the test group(s) are modified, statistical data collection continues to learn how user behavior changes as a result of the modifications. The statistical data obtained from the users of the various groups may be compared and contrasted and used to gain a better understanding of customer experience with the website.

    Other inventors
    • Jeremy York
    See patent
  • Detection of behavior-based associations between search strings and items

    Issued US 8,112,429

    A system and method are disclosed for automatically detecting associations between particular sets of search criteria, such as particular search strings, and particular items. Actions of users of an interactive system, such as a web site, are monitored over time to generate event histories reflective of searches, item selection actions, and possibly other types of user actions. An analysis component collectively analyzes the event histories to automatically identify and quantify associations…

    A system and method are disclosed for automatically detecting associations between particular sets of search criteria, such as particular search strings, and particular items. Actions of users of an interactive system, such as a web site, are monitored over time to generate event histories reflective of searches, item selection actions, and possibly other types of user actions. An analysis component collectively analyzes the event histories to automatically identify and quantify associations between specific search strings (or other types of search criteria) and specific items. As part of this process, a decay function reduces the weight given to a post-search item selection event based on intervening events that occur between the search event and the item selection event

    Other inventors
    See patent
  • Bayes rule based and decision tree hybrid classifier

    US 6,026,399

    The present invention provides a hybrid classifier, called the NB-Tree classifier, for classifying a set of records

    See patent
  • Method system and computer program product for visualizing an evidence classifier

    US 6,460,049

    A method, system, and computer program product visualizes the structure of an evidence classifier.

    Other inventors
    See patent
  • Method, system, and computer program product for visualizing a decision-tree classifier

    US 6,278,464

    A method, system and a computer program product for visualizing a decision-tree classifier are provided

    Other inventors
    See patent
  • Strategies for providing diverse recommendations

    US 7,542,951

    Strategies are described for generating recommendations

    Other inventors
    See patent
  • Strategies for providing novel recommendations

    US 7,584,159

    Strategies are described for generating recommendations

    Other inventors
    See patent
  • System and method for selection of important attributes

    US 6,026,399

    A system and method determines how well various attributes in a record discriminate different values of a chosen label attribute.

    Other inventors
    • Dan Sommerfield
    See patent

Honors & Awards

  • 59 A/B testing influencers you need to follow in 2023

    Kameleoon

  • 60 influencers in A/B testing you need to follow in 2022

    Kameleoon

    https://www.kameleoon.com/en/blog/60-influencers-ab-testing-you-need-follow-2022

  • 82 influencers in A/B testing that you need to know in 2021

    Kameleoon

    https://www.kameleoon.com/en/blog/top-ab-testing-influencers

  • Over 50,000 citations to papers

    -

    https://scholar.google.com/citations?hl=en&user=O3RYHGwAAAAJ&view_op=list_works&pagesize=100

  • Experimentation lifetime achievement award

    https://experimentationcultureawards.com/

    https://experimentationcultureawards.com/#ronnykohavi

    https://www.linkedin.com/posts/ronnyk_expca2020-abtest-experimentguide-activity-6714985210947207168-7Hec

  • Quora Most Viewed Writer in A/B Testing (adjusts real-time, usually top 3)

    Quora

    https://www.quora.com/topic/A-B-Testing/writers

  • AMiner 5th most influential scholar in AI, 26th most influential scholar in Machine Learning

    -

    https://aminer.org/mostinfluentialscholar/ai
    https://aminer.org/mostinfluentialscholar/ml


  • Forbes article: A Massive Social Experiment On You Is Under Way, And You Will Love It

    Forbes

    Quoted in http://www.forbes.com/sites/parmyolson/2015/01/21/jawbone-guinea-pig-economy/

  • IEEE Tools with Artificial Intelligence best paper award

    IEEE

    IEEE Tools With Artificial Intelligence Best Paper Award for the paper Data Mining using MLC++, a Machine Learning Library in C++ by Kohavi, Sommerfield, and Dougherty.

  • President's award (top 5%) each year of BA degree

    Technion

Languages

  • Hebrew

    Native or bilingual proficiency

  • English

    Native or bilingual proficiency

Organizations

  • SIGKDD

    -

Recommendations received

More activity by Ron

View Ron’s full profile

  • See who you know in common
  • Get introduced
  • Contact Ron directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ron Kohavi