I think there might be a natural way to progress by stepping outside of the Bayesian formalism for a bit. The question I always want to start from in any problem where we think Probabilistic reasoning might be applicable is to ask “What is our state space?”
Poker has an obvious state space as a finite game. The 52 cards in a deck can be stacked in (Factorial(52)) ways, but we’re interested in the 5 cards you are specifically given and can collapse the order of those 5 cards, so it’s just 52C5. The configuration B that you are interested in is that the 5 cards you have been dealt match one of 4 patterns - the TJQKA of a matching suit in any of the 4 suits. This is a very small fraction of the poker state space, and so we say that it has a low probability in a random game.
Clearly, however, the state space of the poker game is not itself the deciding factor in your scenario, because now we introduce a Dealer. In your game you’ve been “dealt” a Dealer, who has dealt you 5 cards in some configuration. Depending on the properties of the dealer, you may have been dealt different configurations - some dealers might only deal busted flushes, others might consistently deal a mix of mid-value hands, and others might be completely fair.
So, what were the odds, given only one piece of information, that we were dealt this particular dealer (or one satisfying some more general property)? The answer to that question requires that we enumerate the state space of dealers, and look at what the information we’ve received about this hand tells us about what possible dealers we might have been dealt.
What is a dealer? A hugely simplified model would say that a dealer in abstract is a one-time function from the set of 52 cards to a 5-card sample set. In this view it should be clear that there are exactly as many dealers as there are 5-card permutations, and so the probability that you would get this hand from this dealer is always 1. However, this only allows for dealers that would then continue to deal you this exact hand. So this is maybe not so interesting as a model.
It seems like we could propose that dealers themselves model a local probability distribution. Each dealer deals each possible hand with some probability. For example, a “Fair” dealer assigns each hand the same probability, a “Constant” dealer gives exactly one hand with probability 1, a “Weighted” dealer gives some hands more often than others.
This is a wholly mathematical model, so there seems to be a basis for some discussion of principles independently of any given prior configuration! Instead of asking “what should our starting Bayes weight values be”, we say “we know we’re somewhere in this particular state space, and want to talk abstractly about how what conclusions we can draw given particular events”.
Unlike the simple Poker model, though, progressing an answer to your question is tricky given that this model is uncountable. Each Dealer is a product of 2.6 million dimensions, and our measure on each of those dimensions is the open (0,1) interval. We have continuum many such possible dealers, so even acknowledging that the probabilities of all of the possible poker hands sum to unity, trying to talk in such absolute generality about this will need some powerful analysis.
——
So, let’s take this core idea and apply it to a simpler game. Cho-Han dice rolling has a similar concept but we’re only interested in odds/evens. Each dealer has some weighting of evens and odds, and the probabilities of these weightings sum to unity.
We still have a similar problem but the analysis involved is much more tractable; each dealer can be modelled as a single (0,1) value representing the odds of getting an Even number. Let’s call this P(i)(E) - representing the chance that dealer i rolls Even.
Again, there are continuum many possible dealers. However, when we roll the dice and the outcome comes up Even, we start to know something about where in the probability distribution we are. It is more likely that we are in a world where our dealer skews Even than it is that they skew Odd. It is impossible that we might have gotten this result from a dealer that only throws Odd, for example.
For a very loose thought, if we to line up all of our P(i)(E) probability functions in a graph like this, we’d end up with a 1x1 square of possibilities. If each of these possibilities was in some general sense “equally random” but we know that the result of the throw was Even, then we can state some general results about the likely chance our random point fell within a given weighted throw. For example, it looks like the area under this graph is halved at 1/root(2), rather than (as intuitions sometimes suggest) 0.5.
(This basically is a much simplified sketch what Bayesian Learning does in each update, and taking integrals is what we do when we want to make predictions or classifications using our learning to date)
Now this isn’t the same as saying the thrower is confirmed biased. In fact our model pretty much assumes a probability of zero that any given thrower has a bang-on 50:50 odds/evens chance (though they can be arbitrarily close). The point, rather, is simply that we can start to assign numbers to just how much bias they have shown given the outcomes they have demonstrated thus far.
And if, as we often want to do in Bayesian learning, we are interested in using past data as a measure of prediction and a suggestion for future action, this bias measure gives us a first glimpse at what we might expect to happen next.
——
So, starting to push back to the point at hand, what if instead we talked about a d3, and asked about a thrown result of 3? Well, we’re adding an extra dimension because we’re adding another free variable here (each dealer can now vary their biases for a 1 or 2 in addition to the bias for a 3), so now we’re talking about volumes rather than areas when we take integrals.
This actually suggests we’re more likely to be weighted towards 3 (being non-committal about the other two values) than the Heads case, since 2^(-1/3) is bigger than 2^(-1/2). And that makes sense, because getting a 3 on a d3 is in a sense more surprising than getting Evens on a dice roll.
I really do not have the grasp of analysis needed to translate this to the Poker case. However, there absolutely seems to be some foundation to the idea that the more apparently unlikely a scenario in an unfamiliar game we encounter is, the more we ought to consider it likely that the dealer is weighted in favour of that outcome. And this is a property we ought to assume holds regardless of our choice of priors.
Any conclusion we might draw is itself only probabilistic. And we haven’t considered more complex models of dealership - the Markov Chain model Noah suggests allows not just for a set probability for each state but also the ability to vary distributions over time, such that our dealers could switch up their strategies. This analysis gets very mathematically interesting in even higher dimensions.
But with only one piece of evidence about how a dealer is weighted, and assuming all possibilities of dealership were independently likely to begin with in this one-time game, it’s more likely to be similar than not.