4
$\begingroup$

This is really a hypothetical question not related to an actual issue I have, so this question is just out of curiosity. I'm aware of this other related question What should I do when a confidence interval includes an impossible range of values? but I think that the details of what I have in mind are different.

Say I want to estimate the proportion of something in a population. I know from qualitative studies that this "something" absolutely does exist in the population, even though it is rare (how rare, I don't really have information about that, except for the couple of observations described in the few qualitative studies on the subject). I plan to compute binomial confidence intervals (e.g. Wilson confidence intervals) to get an estimation of plausible proportions in the population.

However, after randomly sampling a few thousand observations from the population, I fail to identify any of this "something" in my sample, so the confidence interval includes 0, even though I know for a fact that zero is not a possible value, and that there are no problems with my sampling method (except a sample size that is apparently not large enough).

Example with R, where "lwr.ci" is the computed lower bound of the CI:

> library("DescTools")
> BinomCI(0, 50000, conf.level = 0.95, sides = "two.sided", method = "wilson")
    
          est  lwr.ci        upr.ci
    [1,]    0       0  7.682327e-05

What are some ways to solve this problem and compute an estimation that does not include 0 in the first place? Is it a situation where I should use credible intervals? (If so, what would be some correct ways to define the priors?)

New contributor
Coris is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$
6
  • $\begingroup$ „I fail to identify any of this "something" in my sample, so the confidence interval includes 0,” This conclusion with 'so' is not so clear and might be clarified better. If I am guessing then it sounds like you are estimating the probability for a binary value when this probability is very small. Is that right? $\endgroup$ Commented Jul 3 at 9:40
  • $\begingroup$ There are several questions about estimation of a proportion when the observed number is zero. For example Revisiting the Rule of Three. $\endgroup$ Commented Jul 3 at 9:43
  • $\begingroup$ "compute an estimation that does not include 0 in the first place" This is a bit tricky. If you don't want to include 0, where else do you draw the boundary. Should 0.000001 be included, or excluded as well? The approach to the problem depends on the information that you exactly have. $\endgroup$ Commented Jul 3 at 9:46
  • $\begingroup$ @SextusEmpiricus Thanks, I didn't see this post! As it's a hypothetical question, I don't really have more information than that to share. So I guess (correct me if I'm wrong) that the answer is really on a case-by-case basis, and I should rather ask this question if one day I encounter the issue in a real-life situation. $\endgroup$
    – Coris
    Commented Jul 3 at 9:51
  • $\begingroup$ @SextusEmpiricus About your first comment: yes, that's what I have in mind. $\endgroup$
    – Coris
    Commented Jul 3 at 9:52

3 Answers 3

4
$\begingroup$

It is not a problem.

Confidence intervals display the range of values of estimates/hypotheses that are supported by the data. Or alternatively the values outside the interval are values that are falsified by the data.

Confidence intervals are an indication for the strength of the proof/support that the data/observations gives about different hypotheses/values. If the interval is large, and the data is plausible with many different hypotheses, then the support for a specific value is not very special. The more values that are excluded and not supported, the more interesting the support for the values that are included.

This function, qualifying the support/proof from the data, is the same whether or not the data/interval fails to provide information that you already know (to exclude the zero). The interval that contains a zero (even though that you know that the zero is not included) helps you to see that the data doesn't give a lot of information about the exact probability parameter. And the information is that the sample is unlikely (outside 95% ci interval) for hypotheses with a parameter roughly above $p \approx 3.7/n$, but values of $p=0$ do correspond well with the data.

If you want something different, like combining multiple sources of information, like prior and data, then you can use a credible interval instead.

$\endgroup$
6
$\begingroup$

Stepping back from the complex discussion of statistical frameworks, the way I see your problem is:

  • you want an interval-estimate of some probability, ie an interval estimate of some parameter bounded in $[0,1]$
  • you have a bias that this parameter is not 0; very small but not 0
  • thus, you want a biased procedure that pushes you off 0.

Is that a good idea though? Isn't it simply clearer to use a standard frequentist interval here? (NB: the wilson CI is the best choice here; the gaussian CI is going to be terrible for $p \approx 0$). I quite like that this approach offers the following honest, unbiased, conclusion:

On the basis of a larger dataset with no positive examples, we can only conclude that $p \leq p_{\max}$ (where $p_{\max}$ would be the right bound of the CI), but we can't conclude beyond that.

Personally:

  • in the context of a scientific study, I would never accept any sort of bias in the estimating procedure for this particular problem
  • I would only be happy with a biased procedure if it can be shown that the conclusion is robust to the bias (ie: slight tweaks to the bias would produce slight tweaks to the conclusion). Here, I'd be worried that the bias is going to push us away from the $p=0$ edge too much.
$\endgroup$
3
  • 1
    $\begingroup$ Thanks. Just to be sure that I understand your answer correctly, do you mean that a simple solution, assuming we're in a frequentist framework, is to simply take into account the prior information when interpreting the CI?(rather than taking into account the prior info when computing the CI, which seems more complicated in a frequentist framework and may create various problems). $\endgroup$
    – Coris
    Commented Jul 3 at 15:18
  • $\begingroup$ Yes, that's what I'm saying. We already have an interpretation which seems satisfying to me, regardless of your prior bias. In a scientific context (and perhaps overall), I want to put a huge emphasis in making sure that my methods account and are explicit about bias, and avoid it if possible. Here, I don't see the value of adding bias, so I'd rather avoid it. $\endgroup$ Commented Jul 3 at 15:35
  • $\begingroup$ NB: biasing a CI sounds impossible, though I'm sure some unnatural construction exists. However, it is straightforward to have biased intervals in a frequentist framework. We just need to consider a formulation for interval estimation based on some sort of score function relating the interval to the true underlying $p$. It is then straightforward to consider biased estimators which put more more importance in having low score in some regions. $\endgroup$ Commented Jul 3 at 15:39
4
$\begingroup$

The obvious solution when you have prior information is to take a Bayesian approach. Perhaps your prior knowledge can be represented via a beta-distribution, in which case things are super-simple from a $\text{Beta}(a, b)$ prior you get a $\text{Beta}(a+y, b+n-y)$ posterior distribution after observing $y$ successes out of $n$ (I'm assuming here that the population is very, very large like e.g. millions of people and your sample $n$ in the study is relatively small relative to the population size - otherwise (survey) sampling considerations for finite population would also start to matter).

Even if that doesn't fit your prior knowledge a mixture of several Beta-distribution can decently approximate anything you might believe (you could e.g. define one using the mixbeta function in the RBesT R package and then use the postmix function to update your prior with what you observed). You can also use a mixture to express greater uncertainty than you might have from prior studies to go more down a weakly informative route in order to only weakly influence what you see.

Example R code:

library(RBesT)
library(patchwork)

my_prior <- mixbeta(c(0.5, 1, 99), c(0.5, 0.1, 99.9), param="ab")
p1 <- plot(my_prior) + coord_cartesian(ylim=c(0, 100), xlim=c(0, 0.05)) + ylab("Prior density")

my_posterior <- postmix(priormix=my_prior, r=0, n=500)
p2 <- plot(my_posterior) + coord_cartesian(ylim=c(0, 100), xlim=c(0, 0.05))  + ylab("Posterior density")

p1 / p2

summary(my_posterior)

enter image description here

        mean           sd         2.5%        50.0%        97.5% 
4.143692e-04 9.994307e-04 9.435260e-23 1.143779e-04 3.464005e-03 
$\endgroup$
3
  • 2
    $\begingroup$ The Bayesian approach is best here. If seeking a frequentist approach, the good old Wilson confidence interval is hard to beat. $\endgroup$ Commented Jul 3 at 11:05
  • 2
    $\begingroup$ I will add that the Jeffreys method works well too and has a frequentist interpretation, $\endgroup$
    – Nick Cox
    Commented Jul 3 at 11:16
  • $\begingroup$ Really nice and clear solution, thanks, this is really useful. I'll wait a bit before accepting answers, to see if other people have things to add to the discussion. $\endgroup$
    – Coris
    Commented Jul 3 at 15:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.