Advanced Topics

In the realm of experimentation, there are situations where you may need to test more than one variant against a control group or employ multiple metrics to evaluate the success of your experiments. While these techniques can provide valuable insights, it's essential to understand the potential challenges and consider best practices to maintain the integrity of your experimentation process.

  1. Bonferroni Correction

    Bonferroni Correction is a statistical method used to control the family-wise error rate when conducting multiple tests simultaneously. When testing multiple variants or metrics, each additional test increases the likelihood of making a Type I error (false positive).
    • Familywise Error Rate: Understand that when you test multiple hypotheses, the overall error rate accumulates. The Bonferroni Correction divides the significance level (alpha) by the number of tests to reduce the chance of falsely identifying a significant result.
    • Adjusted Significance Level: Adjust your significance level for each individual test to maintain a consistent family-wise error rate. For example, if you initially set alpha at 0.05 and you conduct 10 tests, you would use an adjusted alpha of 0.005 (0.05 divided by 10) for each test.
    • Balancing Significance and Power: Keep in mind that the Bonferroni Correction is conservative and can reduce the power of your experiments. Striking a balance between significance level and statistical power is crucial.
  2. Testing Multiple Variants Against Control

    When conducting experiments involving multiple variants tested against a control group, consider the following:
    • Defining Success: Clearly define the criteria for success. Determine whether you are looking for any variant that outperforms the control (e.g., Any Variant Beats Control) or if you have specific variants in mind that must surpass the control. Keep in mind that the Glassfy Experimentation Platform works under the hypothesis former of the two, thus concluding the test as soon as one of the variants significantly outperforms control.
    • Statistical Significance: Apply statistical tests, such as A/B/C testing, to identify which variants, if any, are statistically superior to the control group. Use the Bonferroni Correction when testing multiple variants to reduce the risk of false positives. Remember, though, that the more variants you use, the lower will the Power of the test be.
    • Practical Significance: While statistical significance is important, also consider the practical significance of the observed differences. A small statistical difference may not have a substantial impact on your objectives.
  3. Multiple Control Metrics

    Utilizing multiple control metrics can provide a more comprehensive view of your experiment's impact. However, it's advisable to have one primary control metric to guide your decisions:
    • Primary Control Metric: Designate one control metric as the primary metric that governs the decision-making process. It should align with your primary experiment objective. This metric is used to determine the success or failure of the experiment.
    • Secondary Control Metrics: You can employ additional control metrics to gain a broader perspective on the experiment's impact. These metrics provide context and help identify unexpected side effects or trends.
    • Significance Threshold: Set a significance threshold for the primary control metric, and ensure that any observed differences meet this threshold to trigger decisions. In the Glassfy Experimentation Platform, you cannot have multiple primary control metrics by design, thus it is not necessary to use the Bonferroni Correction.
    • Monitoring Secondary Metrics: While secondary metrics can provide valuable insights, avoid making decisions based solely on them. Use them for exploratory analysis and understanding user behavior.

By implementing the Bonferroni Correction, carefully testing multiple variants against a control, and designating one primary control metric, you can maintain the statistical rigor and integrity of your experimentation process. These advanced techniques allow you to explore and evaluate multiple aspects of your mobile app while reducing the risk of drawing false conclusions.

  1. The Novelty Effect

    The Novelty Effect is a phenomenon where users tend to exhibit different behavior when they encounter a change in an app or service. This change could be a new feature, pricing model, or design update. When existing users interact with a new feature or pricing model, their behavior may be influenced by the novelty of the change. For instance, they may be more engaged initially due to curiosity, but this behavior might not be sustainable.
    Why It Matters:
    • The Novelty Effect can lead to misleading results in your experiments. When you include all users, including existing ones, in your pricing or design experiments, you risk conflating the genuine impact of your changes with the temporary effects of novelty.
    • To obtain meaningful and accurate insights, it is crucial to isolate the response of New Users from existing users. New users are less likely to be influenced by prior experiences with the app and can provide a more genuine representation of how your changes impact user behavior.
    • By focusing on New Users, you can better understand the long-term effects of your pricing and design decisions, ensuring that the observed improvements are not merely driven by the temporary fascination of existing users with something new.
    • Additionally, segmenting New Users can help you tailor your onboarding experiences, pricing strategies, and design to make a positive, lasting impression from the moment users first engage with your app.