Report 4#
Report on “Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology” by Larsen et al. (2023)
As the authors acknowledge, the growth and widespread adoption of the internet has made Online Controlled Experiments (OCEs) an “indispensable tool for major technology companies when it comes to maximizing revenue and optimizing the user experience”. These experiments serve as the online equivalent to traditional Randomized Controlled Trials (RCTs) and their rise in popularity has fostered some sort of “culture of experimentation” within the non-academic community. During the rise of this culture, numerous challenges and nuances have emerged. Therefore, the primary focus of the article is to review the methodologies associated with OCEs and provide guidance on what current literature suggests to best address these challenges.
One of the strengths of this document is its comprehensive coverage of the existing methodologies in literature, such as control variates, stratified sampling, trigger analysis, optional stopping, etc. The authors do a great job at summarizing the essence of each topic/methodology while mantaining academic rigor in their more complex cases, for which they direct readers to the Supplementary Material section or recommend literature where one can dig deeper into them.
Another notable strength of the paper is its use of real-world examples to introduce each topic/methodology. The authors ilustrate how businesses facing specific challenges migh benefit from applying each particular methodology. The examples are friendly to all readers, regardless of their familiarity with the math behind them, and effectively highlight the significance of the challenges that these methodologies address.
A possible weakness of the paper could be the level of rigor mantained outside of the examples, which we said were fairly friendly to everyone. Although the real-world applications of the metholodogies are clearly presented, the technical language used to describe their implementation, coupled with the authors’ references to various specific scenarios within each research design and methodology, might confuse readers not familiar with the topcis discussed.
Overall, this paper significantly advances the knowledge on the matter by synthesizing current practices and challenges in the field of A/B testing. Thus, the contribution of the authors lies in highlighting the need for methodological improvements to address the real world issues of A/B testing and provides a usefull tool for aspiring researchers who want to dig deeper into them.
Two possible next steps to further advance the research question are, firstly, conducting a similiar review but focused specifically on the challenges encountered in A/B/n testing, where there are many treatments. Secondly, creating a document aimed at bridging the gaps between industry and academia in this topic. This could foster a more collaborative approach.