The Sisters “paradox” – Counter-Intuitive Probability
Key topics
The "sisters paradox" has sparked a lively debate about counter-intuitive probability, with some commenters finding it maddeningly complex while others see it as straightforward. The discussion reveals that the key to resolving the paradox lies in understanding that the three possible outcomes - boy-boy, boy-girl, and girl-girl - are not equally likely, with boy-girl being twice as probable as the other two. As commenters dug deeper, they connected the paradox to real-world scenarios, such as analyzing subsets of data and avoiding biases in decision-making. The conversation highlights how a seemingly abstract probability problem can have practical implications for business and data analysis.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
107
0-12h
Avg / period
15.7
Based on 141 loaded comments
Key moments
- 01Story posted
Aug 28, 2025 at 9:17 AM EDT
4 months ago
Step 01 - 02First comment
Aug 28, 2025 at 9:17 AM EDT
0s after posting
Step 02 - 03Peak activity
107 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 5, 2025 at 3:53 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Not how I'd describe it. The setup is mundane enough for people to just assume that their intuition will work fine. The difference between the naive and correct answers is too small to spot in a small-n dataset. And ~0% of the population is actually familiar with analyzing such situations, for their "intuition" to be applicable.
It's a bit like Gell-Mann amnesia - people are too quick to apply an easy cognitive strategy, when (in theory) they know enough to rule that strategy out.
- boy - boy
- boy - girl
- girl - girl
So it must be 1/3 chance. If you’re looking at permutations in order, that’s a different question.
To get the right answer you must be careful about conditional probabilities (or draw out the sample space explicitly). The crux of the issue is that you are told extra information, which changes your estimate of the probability.
(This question as written is very easy to misinterpret. The Monty Hall problem, which illustrates the same thing, is better since the sample selection is much more carefully explained.)
Specifically, if you know that one is a girl, then the unordered options seem like they are back on equal footing? That is, it isn't twice as likely if you know that one ordering can't happen? (Or, stated differently, you don't know which version of two girls you are looking at.)
So, for this one, you know that either the youngest is a girl (so, girl-boy is not possible) or that the oldest is a girl (so boy-girl is out). That puts you back to the rest of the possibilities. Boy-boy is out, sure, as you have a girl. But every other path remains? So, you have one of (boy-girl(known), girl-girl(known), girl(known)-boy, girl(known)-girl). Which drops you back to 50/50?
What the problem is really saying is this:
1) You have a large collection of families with two kids of varying genders.
2) You draw one of them at random. At this point, your only estimate of P(2 girls) is 0.25.
3) Someone tells you that the family you drew has at least one girl.
4) This extra information changes your probability estimate because the possibility of two boys has been ruled out; the naive 1/4 estimate is refined to 1/3.
The way you are interpreting it is this:
1) You have a large collection of families with two kids, at least one of whom is a girl.
2) Then the probability that the other child is a girl is clearly 50%.
As a reminder this is how the original post phrased the question:
This is just too vague and admits both interpretations, they needed to be more specific about where the family "came from." That's why Monty Hall is a better illustration: it starts with you explicitly choosing a door at random. Here the family has been chosen at random from the pool of families with two children, but that's totally unclear.That’s not correct in general.
It’s only correct if you assume that “3) Someone tells you that the family you drew has at least one girl.” was equally likely to happen whether or not there were two girls.
That’s a quite strong assumption.
One can make different assumptions and get answers different from 1/3. For example, 1/2.
Ok, AIPedant.
I understand that you may find irritating that someone points out that the original problem is ill-posed and “the answer” depends on how we decide to “rephrase” it.
However, it doesn’t seem boring in the context of a discussion of how the problem is not well-posed and additional assumptions are required to get an answer.
So, in the original: "a family has two children. You're told at least one of them is a girl." What are the possible states? Well, assume first born is the girl, then you have 50% that the next is a girl. Then, assume that the first born was a boy, then there is no chance and the second born is the girl that you know of. So, at 50/50 on those chances, you have 50% chance of having a 50% chance, or a 50% chance of it being 0. I can't see how to combine those to get 1/3. :(
And the Monty Hall explicitly covers the case that a decision is made on which door is shown to you. I don't see any similar framing to this problem. Yes, the total states are GB, BG, GG, but only if you treat GG in such a way that either BG or GB was not a possible state. (That is, using G for girl that you know of, and g for unknown, then possible states are GB, Gg, gG, BG. There is no version of Bg or gB that is possible, so to treat those as equal strikes me as problematic.)
Specifically: it is not true that the firstborn has a 50-50 chance of being a girl, given you were told that the family has at least one girl. The firstborn has a 2/3rds chance of being a girl. This is the heart of your confusion.
In a broader sense there is an entire class of confusing conditional probability problems like this. Events which are causally independent in reality (e.g. gender of a child, which door Monty Hall hid the car behind) fail to be probabilistically independent when you have extra information. Yet these probability games are contrived in a way that our intuition takes over and we use our causal understanding even when a better probabilistic understanding gives you a better answer.
Consider, if you tell me that the Smiths have at least 1 girl, and I meet their daughter but you haven't met either, I have no way of knowing if I met the 1 girl or not. I could ask you, but you would just say, "I don't know, could be her. Could be the other kid. I just know they have at least 1 girl."
This is very different from the monty hall case, where the announcer knows what is behind all doors.
Similar. But different. I was using coins as an example elsewhere. If I flip a dime and a quarter, and tell you that one of them is heads, do you have increased chance of knowing if either particular one is heads? This is more liar's dice than it is monty hall.
This is not coherent as written:
"Increased chance of knowing" is nonsense. What you mean is "increased chance of being correct if I guess the dime came up heads instead of tails" and this is obviously true. Given at least one of the two coins came up as heads, the probability that the dime is heads is 2/3rds, not 1/2.When you have a child, the odds are ~50% ... so the chance the next child is a boy or girl is almost equal. Is it because of the way it's framed that makes people think harder than they need to be?
This is like when I (very rarely) play something like "pick six".
I play 1, 2, 3, 4, 5, and 6. People think I'm crazy. They don't realize I have the same odds as any ticket they purchase.
One billion dollars goes a long way ... even after taxes!
Does anyone have some real life examples? i cannot think of any off hand but would like to be able to cite a couple if someone says "So, what is this good for?".
"When you're on TV, you need to use a DPad to laboriously type in a review. Same with the watch and its tiny screen. When you're on VR, you're using a laser pointer. That makes the barrier to leaving a review very high. As a result, people only bother to leave reviews if they have particularly strong feelings. The strongest motivating emotion is anger. Ergo, we are getting a disproportionate number of reviews from people who are angry, and everyone that is happy is not going to bother leaving a review."
Same goes for many kinds of user feedback situations. The feedback you're measuring is P(user holds that opinion | user bothers to tell you that opinion). The denominator needs to be understood in order to evaluate that probability, but by definition, you don't have any data on people who don't bother to engage with you, unless you can measure their interaction in some other way (eg. usage data). This is why online reviews of many offline businesses skew negative; why businesses will often nag you to leave a review if you're satisfied; why corporate executives often overestimate employee satisfaction, particularly in prestigious businesses (anyone who is dissatisfied will leave and get a job elsewhere), and why Congress has such a low approval rating (by definition, you can't escape the laws enacted by Congress, so there is no "exit" option, and you get an accurate appraisal of what people really think).
I.e. without "discarding", just giving some additional, but not complete, information on the random sample. Is adding information about the picked sample the same as discarding all contrarian samples? Why is this relevant?
It’s not a random family if it must have at least one girl. If you want to talk about a random family you can only make statements of the kind “one of the children is <gender>” where the gender depend on the specific family or “the family has between 0 and 2 girls”
"a random family has been sampled, the sample family has two childs, one of them is a girl"?
and
"a random family has been sampled, the sample family has two childs, one of them is a boy"?
and they selected each statement based on randomly picking a child from a random family then the probability actually becomes 50% boy/girl for the next child since the boy/boy or girl/girl has twice the chance of generating the above statement for the respective gender compared to the mixed gender children family.
Ie. if they say one is a girl that statement had a 50% chance of being generated by a girl/girl family (since we pick the statement based on a random selection of one of the two childrens gender and there's 2 girls, doubling the chance of a statement that one's a girl coming from a girl/girl family), there's 25% chance the statement was generated from a girl/boy family and a 25% chance the statement was generated from a boy/girl family.
If you take 50% chance girl/girl, 25% chance boy/girl and 25% girl/boy you'll see there's a 50/50 chance of the next child being either gender.
All this due to changing how we sampled.
I go to a random house on a random street and knock on the door. A young girl opens the door. I ask how many siblings they have and they say one. What's the probability that they have a sister?
Now it's 50% even though cosmetically it seems like it'd be fair to say that the family has at least one daughter. The reason is that once I see a girl at the door, I'm slightly more confident in that it's a GG household since a GB or BG household would sometimes show a boy opening the door (assuming the two kids are equally likely to open the door).
P(GG | G at door) = P(G at door | GG) P(GG) / P(G at door)
P(G at door) = 1/2 (by symmetry)
So, P(GG | G at door) = 1 * 1/4 * 2 = 1/2
However, if the question is interpreted as "what's the probability of having two girls if we know there aren't two boys," then the event space is GB, BG, GG, and p(GG)/(p(GB) + p(BG) + p(GG)) = 1/3. Both GB and BG are in the event space because we are not conditioning on the sex of one specific child.
Look at the more technical descriptions using conditional probabilities of the Monty Hall problem as it is essentially equivalent. You’re trying to factor in the probability of whether Monty knows if a goat is behind a door when the observable information is that there is an open door with a goat. One you make that observation many things collapse.
> Assume the family is selected at random because they have at least one girl.
And then again, if they sampled all families with 2 children the posterior would not change, would it?
Still assuming boy vs girls are completely iid and equally probable
Assume all houses have 2 children, and each child has equal probability of being either a boy or a girl. It will help to treat children as distinguishable, e.g. by eldest vs youngest.
* You randomly choose a house, and a random child answers the phone. You question the person who answers the phone about his/her gender and that of his/her sibling. You repeat this experiment. Given that the person who answered the phone was a girl, the conditional probably she has a sister is 1/2. Note that this is not bertrand’s box, as the 1 girl house can occur as either eldest vs youngest girl (unlike Bertrand box where only 1 box has exactly 1 gold) so they cancel out and so a girl you spoke with is equally likely to have been from either a 2 girl or 1 girl house.
* You categorize houses into 2 girl, 1 girl, and 0 girl houses. You randomly pick a category, then pick a house from that category, then phone them. As before a random child answers, and you question them. given that the person who answered the phone was a girl, the conditional probably she has a sister is 2/3. This is bertrand’s box, you’re more likely to have spoken with a girl from from a 2 girl house than a 1 girl house. Explicitly grouping by # of girls first before sampling breaks the previous symmetry.
* You randomly choose a house, and ask for the eldest child. You question him/her. You repeat this experiment. Given that the person whom you spoke with was a girl, the probability she has a sister is 1/2. Nothing new here, as seen in case (1) you were already equally likely to speak with a girl from a 2 girl vs 1 girl house, so asking for the eldest person (which by symmetry is equally likely to be a boy or a girl) doesn’t change anything here.
* You categorize houses into 2 girl, 1 girl, and 0 girl houses. You randomly pick a category, then pick a house from that category, then phone them and ask for the eldest child. Given that the person who answered the phone was a girl, the conditional probably she has a sister is 1/2. Explicitly choosing the eldest disrupts the asymmetry in bertrand’s box: since every house has only 1 eldest which is the one you speak with, being from a two girl house no longer makes that girl more likely to have spoken with the caller.
* You randomly choose a house, and a random child answers the phone. You question the person who answers the phone. You repeat this experiment. Given that EITHER the child you spoke with OR their sibling is a girl, the probability you spoke with someone from a 2 girl house is 1/3. It might seem counterintuitive at first that loosening the criteria _reduces_ the probability of speaking with someone from a 2 girl house. But this makes sense, since there’s still only 2 ways you can speak with someone from a 2 girl house (either the eldest or youngest sister), but now 4 ways you can speak with someone from a 1 girl house, since you’re allowed to speak with the boys of that house as well.
* You randomly choose a house, and ask for the eldest child, and question them. Given that EITHER the child you spoke with OR their sibling is a girl, the probability you spoke with someone from a 2 girl house is again 1/3. Explicitly speaking with the eldest doesn’t make a difference here because we’re already conditioning on either the eldest or youngest being a girl.
What's the paradox?
Edit, ok, there are things like "This statement is false.", but we should perhaps stick to "self-referential problems" with those.
I think paradoxes just exist in our theories, languages, and formal systems when we make flawed assumptions or create inconsistent frameworks. But physical reality itself just is what it is - no contradictions, just phenomena we sometimes struggle to describe accurately.
If contradictions (paradoxes) can exist, then anything becomes possible through the principle of "explosion in logic". From a contradiction, any statement can be "proven" true. The whole foundation of rational thought would be undermined. Right?
It's why we don't screen for just any condition in the general population. I.e. we just do it for 65+ y/o's, 3 packs/day smokers because there we may actually find it worth the cost of the program.
There's no contradiction anywhere in this scenario, just people's incorrect intuitions meeting (mathematical) reality.
Now if you see a boy disregard that since you can't make the statement that one is a girl.
If you see a girl go ahead and make the statement "a family has two children. You're told that at least one of them is a girl.
What is the probability now?
You have twice the chance of making that statement if you encounter a gg family over a bg/gb family right since there's one of two girls possibly answering the door amongst those families.
So 50% chance of that statement being enabled from a gg family, 25% chance coming from a bg family, 25% chance of coming from a gb family. Which means 50% chance the other child's a girl and 50% chance the other childs a boy.
The probabilities here are entirely dependent on details of the sampling which is not made explicit here.
——
You meet three people:
Alice has two children. You're told that at least one of them is a girl.
Bob has two children. You're told that at least one of them is a boy.
Csilla has two children. You're told that at least one of them is a lány. That clearly meant boy or girl because of the context, but you don’t know enough Hungarian to know what it is.
For each of them, what's the probability that they have a girl and a boy?
—-
You meet all the parents with two children in your neighborhood. Say there are 60 such families.
For 30 of them you’re told that at least one of them is a boy. What's the probability that they have a girl and a boy?
For the other 30 you’re told that at least one of them is a girl. What's the probability that they have a girl and a boy?
> A simpler question
> Let's image you're asked a simpler question.
> A family has two children. What's the probability both are girls?
"Mrs. Chance has two children of different ages. At least one of them is a boy born on Tuesday. What is the probability that both of them are boys?"
(note: it is a puzzle, not a biology or data demography problem. so there are 50/50 independence assumptions on gender and uniform day of week assumptions prior to adding the conditioning.)
Furthermore, “of different ages” is likely intended to exclude the case of twins. However, even with twins, one is generally nominally older than the other. (Not to mention that it’s possible for two non-twin siblings to be the same age in years, at certain points in time.) Why not just say “that aren’t twins”?
I loathe when logic puzzles are obscured by ambiguous language, turning them more into “gotcha” text interpretation riddles than logic puzzles.
Of course colloquially twins are the same age, but we are talking about a mathematical puzzle about probabilities here, where precision is paramount.
I can't speak for Winkler, but both he and Jaynes implicitly separate the reading of the puzzle from the work. Winkler start his book with a few awful "reading trick ones", but in the explanations gives a few reading directions to try and avoid that going forward. I happen to know he meant "on a Tuesday." But a correct solution to a different read would be a correct solution even if it doesn't match the book text. I don't think he was trying to set a text trap, it is just hard to be clear, concise, and unambiguous at the same time. (Even "on a Tuesday" isn't completely clear if it means "all I am telling you was the day of week was Tuesday" versus "it was a very specific Tuesday, that I am not telling.")
This does require a rather skilled interviewer, so the benefits may well not be worth it. But it can be very interesting information to have.
The “thinking ways” that allow one to solve the problem can be considered to be socio-normative and neuro-typical; normally these fit white patriarchal modalities.
The mental modalities that make it harder to solve such problems are those related to sequence memory weaknesses, comprehension weaknesses, stress factors, attention weaknesses, social differences, exposure, culture, education. So dyslexics, ADHD, Autistics, socio deprived (poorer backgrounds), may struggle with tums like this that assume a consistent world view - when in fact they likely have other strengths in problem solving. It’s not a one size fit all.
Additionally like IQ, ability to solve these types of problem is down to either natural ability or practise in the domain - that is you can increase your IQ by training against the core elements IQ tests look at.
I tend to get candidates to take me through something they know well, or love, or have solved, and then I ask them about how they did it. This shows me genuinely how good they’ll be at the job in hand.. and is why my teams are actually diverse.
Excluding twins is so that we can assume the probability of each day of the week is independent.
https://www.online-python.com/RueVd2514m
No matter how I frame or interpret this question, the birthdays and birth order appear completely irrelevant - the results are still ~0.50 as expected. Whatever the author was trying to say, they didn't communicate it well. I'm really curious exactly what word or phrase the author thought I was supposed to take to mean something else. If someone could edit one of these simulations to show what the author intended then that would probably be the clearest way.
* Take all families with two children
* Take a subset where at least one child is a boy born on Tuesday
* Take a subset of the previous subset where all children are boys
* The share of the 2nd subset relative to the 1st subset is around 48%
So we can see in the limit as the information becomes more and more specific it turns into the unconditional probability. That is, the case of “the first is a boy, what is the probability both are boys” (50%).
I think this clarifies the situation in the OP pretty well.
I'm a bit at a loss I have to admit.
If we assume that each child really did have a 50/50 chance of being boy or girl, then the result would be that there's a 1/101 chance that it's 100 girls.
Given what I know about the world and genetics and such, I think it's much more likely that there's some predisposition by the couple to have girls.
If we think it's, say, 90/10, then the prior probability of the 100-girls case would be 0.9 ^ 100, and the prior probability of each of the 99-girls cases would be 0.9^990.1 -- i.e. the all-girls case is 9x more likely. 0.9^100 / (100 0.9^990.1 + 0.9^100) = 0.9 / (1000.1 + 0.9) = 0.9/10.9 = about 8% probability of having 100 girls, 92% probability of having 99 girls and 1 boy.
If we think the couple has 99:1 odds of girl:boy on each birth, then it's 0.99^100 / (100 * 0.99^990.01 + 0.99^100) or about 50/50 on whether they have 99 girls or 100 girls.
If we think the odds are 999:1, then it's 0.999^100 / (100
0.999^99*0.001 + 0.999^100) = around 90% chance they have 100 girls.Someone else can do the math assuming an uninformative prior on the couple's girl:boy odds and calculating the posterior distribution given that we know there are 99 girls.
But that's unrealistic. In real life, the context for how and why there would be a speaker telling you such a thing in the first place can be relevant and affect the probability!
How is this possible? Suppose among all the math-riddle-loving parents of two children who would ask such a puzzle in the first place there are an equal number of parents of B-B, B-G, G-B, G-G, and that each is equally likely to ask you such a riddle when you meet them.
Suppose when asking such a riddle the B-B parents tell you "at least one of them is a boy" (they don't have any girls, so that's the only way they can ask this kind of riddle), the G-G parents tell you "at least one of them is a girl" (same thing but in reverse), while the B-G and G-B parents say one of "at least one of them is a boy" and "at least one of them is a girl" equally at random.
Then, conditioned on being told that "at least one of them is a girl", the probability of another girl is actually 1/2, not 1/3 like the paradox answer claims. To see this, imagine 40 examples of the above puzzle asking taking place. You get 10 B-B parents saying "at least one of them is a boy", 10 G-G parents saying "at least one of them is a girl", and among the 20 (B-G and G-B) parents since they choose randomly, you have 10 saying "at least one of them is a boy" and saying "at least one of them is a girl".
So out of the 20 times where "at least one of them is a girl" is said, there are 10 cases where it's a G-G family and 10 cases where it's a B-G or G-B family, therefore conditioned on being told "at least one of them is a girl", the probability of two girls is actually 1/2.
If there were some gender bias in how the B-G and G-B families might ask the question, or other differences that affect how likely different of these people would be posing the puzzle to you, then the probability could be yet different than either of 1/3 or 1/2.
So there's a difference in being present something as a flat mathematical assertion that you're supposed to take at face value and not supposed to question further (where the probability is 1/3, as the article claims). Versus being told something in real life, where you always need to take into account the context and situation of the speaker, and the probability could be different.
There are real life implications of this too - the big classic one being publication bias / newsworthiness bias. As most people intuitively know by now, it is also often wrong to take the statistical analysis or claims of a particular research study or paper entirely at face value, because there is a bias in the fact that "positive" and "exciting" results are more likely to be reported in the first place, and so statistical outliers that aren't actually replicable are disproportionately likely to be reported (see also https://xkcd.com/882/). And publication bias still occurs with respect to the reporting of results, amplification or not in the media etc, even when the the authors themselves are trustworthy and have done their analysis within the paper in a statistically proper way. So conditioned on you hearing about the result in the first place, it is often less likely to be true (and less likely to replicate in the future, etc) than you would think if you just took the statistical analysis in the paper at face value, even when that analysis was done correctly. The situation in the "sisters paradox" of computing a probability taking a statement entirely at logical face value is rare in real life.
Otherwise the context of “how/why you're being given that fact” is relevant, because the problem being discussed asks for P(X+Y=2| being told that “X=1 or Y=1”) and that depends on P(being told that “X=1 or Y=1”|X=1,Y=0), P(being told that “X=1 or Y=1”|X=0,Y=1) and P(being told that “X=1 or Y=1”|X=1,Y=1).
You cannot fault people for noticing the relevance of the missing information and pointing out that the problem you posed is not the one that you think and the answer may not be the one that you propose.
That’s not what the first question said. The first question was select a family (Bb,Bb,gb,gg)
Then that they happen to have a girl.
If the family was picked at random, a GG family is twice as likely to have resulted in us finding out one of the children is a girl as a BG or GB family (and a BB family would be ruled out by observation) so you end up with the intuitive but in this case also entirely correct 1/2 chance.
If the family was picked from only those that have at least one girl, a GG family is equally likely as a BG or GB family (and a BB family would be ruled out again but in this case by definition) you end up with the very unintuitive 1/3 chance the article describes.
So the initial filtering is necessary to create this "paradox" (it's actually not a "paradox" but a "problem", as others have mentioned). Without it, the intuitive answer is actually the correct answer.
> Following classical probability arguments, we consider a large urn containing two children.
I like how they modified a classic from probability texts, drawing items from an urn, and made sure it would be big enough in this example to accommodate two kids.
As others are pointing out, this is just the Monty Hall problem. But the way the question is posed there is much clearer.
"You're told that at least one of them is a girl"
> Many people will assume that the author looked at only one child
There is no mentioning of "looking"
The answer is: it doesn't matter how because that is an unambiguous statement.
It means "you can assume the family does not have two boys".
I think people are actually getting hung up on "you are told" as if that could be a lie, or some kind of trick, when it is really just supposed to mean "here is some more information that you can rely on".
Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability the other I didn’t see is a girl?
Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability the other is also a girl?
Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability there are two girls?
Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability there are two girls?
But it does not mean that you can assume that p(you're told at least one is a girl | both are girls) = p(you're told at least one is a girl | they aren't both girls) as explained by 6gvONxR4sf7o.
If you allow assuming whatever you want, then many answer are allowed!
That’s what it means that the problem is not “well-posed” as mentioned by in_cahoots. You need additional assumptions to get a definite answer - and the answer will depend on the assumptions.
As JeffJor noted it seems much more natural to have assumptions that keep the symmetry of the problem (because why not?) and the answer 1/2 is not just possible but arguably “better”.
And, of course, neither are paradoxes. They're just math that can seem paradoxical if you don't look closely at it.
"Different readings of the setup imply different answers to p(what you're told | the unknowns)." See https://news.ycombinator.com/item?id=45056790
Do you think that it's perfectly clear that the answer to all the questions here is 2/3? https://news.ycombinator.com/item?id=45057514
Again, much like the Monty Haul problem.
And if you meet someone and you are told that they have at least one boy the probability that they have a girl and a boy is 2/3, because the question has one straightforward and obvious interpretation.
And if you meet several people and you are told that they have at least one girl the probability that they have a girl and a boy is always 2/3, because each time the question has one straightforward and obvious interpretation.
And if you meet several people and you are told that they have at least one boy the probability that they have a girl and a boy is always 2/3, because each time the question has one straightforward and obvious interpretation.
And if you meet several people and sometimes you are told that they have at least one boy and sometimes you are told that they have at least one girl the probability that they have a girl and a boy is always 2/3, because each time the question has one straightforward and obvious interpretation.
It's fine to make whatever assumption you need to get the answer you want but that doesn't make it the "straightforward and obvious interpretation". Assume your assumptions!
https://en.m.wikipedia.org/wiki/Monty_Hall_problem
Case 1: The prize is behind door #1, and the host must open door #2. Probability 1/3.
Case 2: The prize is behind door #2, and the host must open door #1. Probability 1/3.
Case 3: The prize is behind door #3, and the host has a choice. Case 3A: The host opens door #1. Probability Q/3. Case 3B: The host opens door #2. Probability (1-Q)/3.
If the host actually opens door #1, the probability that door #2 has the prize is (Case 2)/(Case 2 + Case 3A) = (1/3)/(1/3+Q/3) = 1/(1+Q).
If the host actually opens door #2, the probability that door #1 has the prize is (Case 1)/(Case 1 + Case 3B) = (1/3)/(1/3+(1-Q)/3) = 1/(2-Q).
My point is that, since you get to see which door is opened, 2/3 is correct only if you assume Q=1/2. We aren't told what Q is, but we must assume it is 1/2 because otherwise the answer is different depending on which door is chosen.
You’re right if we are asking about a specific case though.
Am I wrong?
You pick a door. Probability 1/3 it has the car behind it. The host then picks another door, revealing the booby prize. That door now has probability 0 of having the car. But you still picked a door with probability 1/3 of having the car, which means there's a 2/3 chance the other door you didn't pick has the car. Write a program to run a Monte Carlo simulation... of the Monty Hall problem... and you will see this is the case.
The host has given you information about where the car is by revealing where it is not. Whether you can effectively use that information is another matter.
of those 3 cases, 1/3 are both girls
If you phrase the question as “someone with two children tell you the gender of a random one, what is the chance the other is the same gender?” Chance is 50/50 because 50% will have BB or GG and the vaporizer isn’t active.
The trick of the so-called "paradox" is turning the question into the Monty Hall but with an ambitious enough formulation that you might be confused it’s not.
vs
"The question writer came across a girl from a two child family, then they asked the exact question above". This is 1/2 chance - select gg from [gg, gg, bg, gb] with gg listed twice since there's two ways to select a girl from that set; ie. coming across a girl is twice as likely to occur from the gg case than it is either gb or bg.
I think that's the clearest wording to get the message across. Either way it's the exact same question but it reasonably has a completely different answer. There's no way to resolve this ambiguity with the question as written.
[0] https://astronomy.stackexchange.com/a/55505
If you look at one random child, see it's a boy and exclude the family, even though the other child may be a girl, then you get the 1/2 probability. If however in that case you also look at the second child, see that's a girl and consider the family anyway, then you get the 1/3
Because most people don't talk formal probabilities, your explanations will be so vague that the other person will not realize your different understanding. You will discuss forever, you will both be right, and you will part ways with the strange feeling that maybe the other person was right, when all along you were talking about different problems. This is why this problem is so notorious.
[0] https://news.ycombinator.com/item?id=24707305
I think this is true of the "children" question, but I actually disagree that this is what makes the Monty Hall question so confusing.
For one thing, I vaguely recall this being asked directly, and even after people agree on all the definitions explicitly, they still consider the answer wrong. (See e.g. some mathematicians like Erdos refusing to believe the correct answer without actually running simulations on computers... by that point you clearly have a real definition.)
For another, when I personally talk to people about Monty Hall, even after I explain the correct answer, and explain all the nuances, people tend to still have a hard time accepting the correct solution and claim to find it counterintuitive (as did I!).
(Also, this problem has an additional layer of ambiguity where “birth order is irrelevant,” but MF and FM are treated as distinct items in the probability set. Is the order irrelevant to the probability, or is it impossible to distinguish the age of the two children? It would be clearer to simply say “each birth is an independent event.” One of the comments on the blog explains this better than I can.)
Almost everybody understands the same problem, AND STILL GET DIFFERENT ANSWERS. If they don't understand it, they make pedantic arguments about Monty's motivations. All of which make the puzzle impossible to answer.
What they don't understand is probability. Probability is a measure of the information you lack about what causes a certain result to occur. That includes the physical details (where the prize is, what the genders are) but also the choices made for hidden reasons.
In the Monty Hall Problem, to reduce complexity, label the doors C (the contestant's original door), R (the door to its right, wrapping around if necessary), and L (the door to its left). What leads up to the game state at the time the decision to switch is made are (A) Where the prize was placed and (B) How Monty Hall chooses a door to open if the prize is behind C.
The naive answer is based on only (A). The two unopened doors (C and R, or C and L) started with the same probability. So they must now have the same probability, 1/2, right? No, wrong, because we need to take (B) into account. If the prize is behind R then the host had to open L. If the prize is behind L then the host had to open R. But if the prize is behind C then the host had to choose. Since we don't know how, we have to assume there was a 50% chance that he would choose R, and 50% for L. Once we see him open, say, R? This 50:50 reduces the probability that the prize is behind C, so switching becomes twice as likely to win.
The Two Child Problem works exactly the same way. What leads up to the point where we are asked for a probability is (A) the gender makeup of the family and (B) how the information came to us if there is a boy and a girl.
The naive answer is based on only (A). A mixed family is twice as likely as either two-of-a-kind family. So the probability of two-of-a-kind is 1/3, right? Wrong, unless we know WITH CERTAINTY that we could not have learned about the other gender. If we do not have that certainty, then just like with Monty Hall we have to assume that half of the time in a mixed we would have learned the other gender. This makes a mixed family half as likely as (A) alone would suggest; in other words, the same as two-of-a-kind.The answer is 1/2.
Joseph Bertrand pointed out, in 1889, why we need to take (B) into account. Martin Gardner, who originated the Two Child Problem, repeated it in 1959. In the same article where he introduced the predecessor to Monty Hall (called the Three Prisoners Problem), and explained why (B) is important. It should be embarrassing to anyone who thinks that the "Tuesday" variation's answer is 13/27. Because it was first mentioned at a puzzle convention named in honor of Martin Gardner and forgot his warning. Adding irrelevant information can't change the answer, and if you take (B) into account the answer doesn't change.
Exactly. As the first reply to the first comment explains “The problem is that we don't know p(you're told at least one is a girl | they aren't both girls).”
It’s funny that the same commenter who writes that “to get the right answer you must be careful about conditional probabilities” finds that doing so is “splitting hairs about something boring and irritating.”
I think this is a reasonable interpretation:
You meet a family at a party. They say "We have two children". You ask "Do you have any girls"? They say "yes!"
This will give you 1/3 probability that the other child is also a girl.
I think this interpretation is more intuitive because it doesn't make any assumptions about how you get your information. Usually in probability questions you assume any information you have is given to you from on high. For example, you just "know" that the family has two children, you don't somehow deduce it. Therefore I assume the same for "one child is a girl" information.
Do you mean “interpretation” or “alternative problem”.
Because if it’s an “interpretation” of the original problem you’re indeed making assumptions to fill the unspecified information.
If you mean that it’s an alternative problem which has a definite solution I agree. It’s a different problem and its relevance to the original one is to illustrate that additional assumptions were required.
The original problem cannot be answered without making additional assumptions about how you get your information. Different interpretations may reach different answers by making different assumptions.
ambiguous?
This is the crux of the thing. Different readings of the setup imply different answers to p(what you're told | the unknowns).
It's also a great case of where bayes rule shorthands can be slippery. You'll usually abbreviate it out (hell, it was tedious to write this way even with copy-paste). But if you abbreviate "you're told there's at least one girl" to "there's at least one girl", then you've stopped modeling a crucial part of the setup. p(there's at least one girl | they aren't both girls) has an unambiguous answer.
Q2: "A family has two children. You're told that at least one of them is a boy. What's the probability both are boys?"
Note that these are symmetric problems, and must have the same answer.
Q3: "A family has two children. You're told that a gender, that applies to at least one, is written inside a sealed envelope. What's the probability both have that gender?"
In Q3, we have no information. So the answer is the proportion of two-child families that are single gendered. That is, 1/2.
But if we open the envelope, and read what is written inside, the problem becomes either Q1 or Q2. Which have the same answer. So we don't have to open it; whatever the answer to Q1 and Q2 is, opening the envelope in Q3 make its answer the same. If that answer is 1/3, we have a paradox. The answer has to be 1/2 of we don't look.
This is what is known as "Bertrand's Box Paradox." Well, if we add a fourth box to his problem, with one gold and one silver coin. I realize that in modern times the problem itself is called the paradox, but what Bertrand actually wrote (edited to this problem) was "How can it be that opening the envelope suffices to change the probability from 1/2 to 1/3?"
The resolution is that probability must be based on the full set of possibilities, not the possibilities that _could_ result from the full set of _states._ These are the possibilities for this problem:
1) BB and you are told that there is at least one boy. 2A) BG and you are told that there is at least one boy. 2B) BG and you are told that there is at least one girl. 3A) GB and you are told that there is at least one boy. 3B) GB and you are told that there is at least one girl. 4) GG and you are told that there is at least one girl.
Each numbered case has a prior probability of 1/4. Let's say the "A" subcases have a probability of Q/4, so the "B" subcases have a probability of (1-Q)/4.
The answer to the first problem is the probability of case 1, which is 1/4, divided by the total probability of cases 1, 2A, which is (1+2Q)/4. That's 1/(1+2Q).
The answer to the second problem is the probability of case 4, divided by the total probability of cases 4, 2B, and 3B. Which is (3-2Q)/4.
Bertrand's paradox, stated another way, is that these must be equal, but can only be equal if Q=1/2 and both answers are 1/2.
Take a look at this problem beginning with no assumptions. We have two kids, and an envelope that contains 'B' or 'G'. Our probability space is (B,G)^3, with each having probability of 1/8.
Now, we add information about the match as conditioning. Conditional on being told that the envelope matches the family, we can exclude the BBG and GGB cases. That brings us down to 6, of which we have BBB, GGG, and (BG,GB)(B,G). With this additional information, the probability of matching genders becomes 1/3. This probability is still 1/3 if we open the envelope to find B or G, since we exclude all three cases where the envelope doesn't match our observation of it.
In my view, this is related to the Monty Hall problem; we have to realize that we're given additional information with the statement/envelope.
Your error is that you seem to be deciding the answer is 1/3 first, forcing you to assume whatever makes that so. You literally said that when you said the probability must become 1/3.
I do look at this problem from the beginning with no assumptions. If you want to be pedantic, the probability space comprises the sample space (the set of possible outcomes), an event space (a set of subsets of the sample space with certain properties), and a probability function Pr(*) that maps each event in the event space to a number in [0,1]. The complete sample space is {BBb, BBg, BGb, BGg, GBb, GBg, GGb, GGg}, which I assume is what you want (B,G)^3 to mean. We then need the events {BBb, BBg}, {BGb, BGg}, {GBb, GBg}, and {GGb, GGg} to all have probability 1/4.
Since we make no assumptions about the how the lower-case letter is “generated” other than IT MUST MATCH ONE OF THE UPPER CASE ONES, we get that Pr({BBb} = Pr(GGg} = 1/4 and Pr({BBg}) = Pr({GGb}) = 0. What we need to determine is how we get Pr({BGb})+Pr({BGg}) = Pr({GBb})+Pr({GBg}) = 1/4.
In order to make the answer become 1/3 in Q1 and Q2, as you assert, we must assume that we do know how the lower-case letter is generated. In Q1 and Q2 we must assume it is an answer to “is there a girl/boy.” This makes one half of each pair 1/4, and the other 0. If we make no assumption, then the Principle of Indifference (literally, that we make no assumptions to distinguish functionally equivalent outcomes) says Pr({BGb}) = Pr({BGg} = Pr({GBb}) = Pr({GBg}} = 1/8. This makes one answer:
A1 = Pr({GGg}) / [Pr({BGg}) + Pr({GBg}) + Pr({GGg})] = (1/4) / [1/8 + 1/8 + 1/4] = 1/2
Yes, this is a variation of the Monty Hall Problem. Most "solutions" to it are really just explanations for how it can make sense. The mathematical solution follows the outline I used above. It is based on the probability, if the door you choose has the prize (compare to a mixed-gender family), that Monty will open door X or door Y (i.e., the other two) as determined PRIOR TO it happening. If you assume it is 100% for the door he did open, which is only determined AFTER HAVING SEEN IT, like you want to assume in Q1 and Q2 that only a girl/boy can be mentioned, then the answer is that switching does not matter. It is only if you use the Principle of Indifference – meaning each has a 50% chance – that the answer is that switching wins 2/3 of the time.
This is equivalent to the host never opening the door with the car in the Monty Hall scenario
Once you open the sealed envelope and it says "girl", it does not become Q1, it becomes a different question:
Q4: "A family has two children. I randomly sampled one of the children and it was a girl. What's the probability both are girls?"
In which case, we're looking at possibilities 4, 5, 7, and 8, and in only 2 of those 4 possibilities are both children girls.
In Q1, you're actually told "A family has two children. I looked at both children and can tell you that at least one of them is a girl. What's the probability that both are girls?". In which case, possibilities 3, 4, 5, 6, 7, 8 are all valid. Only in 2 of those 6 possibilities are both children girls.
So as in_cahoots said in https://news.ycombinator.com/item?id=45053187, it matters whether the person asking looked at both children or just a single one.
Otherwise you can derive any probability as a branch of a probability tree that contains it and calculate the probabilities of the tree and then the one of the branch. This makes no sense.
For example, a family has a kid and the kid is a girl. The family wants another kid; what is the probability to be a girl? Is it 1/4 because having 2 girls is 1/4? No, it is 1/2 as it is for any new kid.
Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability the other I didn’t see is a girl?
Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability the other is also a girl?
The question in the article is the second question, not the first. The fact that the observer looked at both children and not just one of them is crucial. As is often the case in these puzzles the exact information available is the critical issue.
Simply stated, if you allow the possibility space of "boy-girl" and "girl-boy", you have to also have two "girl-girl" states. Since you don't know which of the kids is known. Why is that not correct?
State it with coins, if I know that you flipped a quarter and a dime and one turned up heads, what are the odds that both are heads?
As you can see there aren’t two HH states just as there aren’t two GG states in the original question.
Edit: I want to be clear that my initial thinking was exactly what you said. I was trying to "steelman" the bad intuition and I think I've trapped myself. :D
> Assume the family is selected at random because they have at least one girl.
Given that plus "a family has two children" and "Assume that the probability of having a girl or boy is 50%"
That means you're starting from the set of all two child families: BB, BG, GB, and GG, being told that you do not have the BB case, leaving 3 ways in which the family could be composed and being asked about "the one which is not a G".
That's different from the dime and quarter case, and would also be different if you were told "the oldest child is a girl", because being told "the oldest child is a girl" eliminates both BB and BG.
Being told "[at least] one of the coins is heads" or "[at least] one of the kids is a girl" only eliminates one of the four cases, while being told "the quarter is heads"/"the oldest is a girl" eliminates two cases.
Consider. You flip a dime and a quarter, one of them is heads. What is the odds that the dime is heads? What are the odds that both are heads?
If you do as stated in this model, you would say the possible states are HT, TH, HH. Doing that, you conclude that the odds of any single one being heads is 2/3. But, that does not make any sense, as you did nothing to change the expected 1/2 odds of either coin. And if the odds of either being H is 2/3, then the odds of both would be?
So, in the original question, what would the expectation be on the question of "what are the odds that the first born is a girl, knowing that at least 1 is a girl?" My intuition would be that it should be 1/2? As the two children are independent events. Just as the two coin flips are independent of each other. Not knowing anything about either coin, the best you can do is the original odds. (Things change if talking about after you observe one, of course. If you see a heads, and I say that at least one is heads, you are then modeling if I am disclosing the one you already know or not. Basically, we have moved into liars dice.)
So, how can you model this such that you keep the 50% odds per coin, but also have it so that the "at least one of" could apply to either coin?
Conversely, why does knowing "at least 1 is heads" change the odds of either single coin being heads?
Yeah, I'm with you here. Assuming heads=girl and tails=boy:
* If order does NOT matter (GB=BG), then it means Alice-Albert (GB, Heads-Tails) is the same as Albert-Alice (BG, Tails-Heads).
* If order DOES matter (GB!=BG), then it means Alice-Barbara (GG, Heads-Heads) is different than Barbara-Alice (GG, Heads-Heads). Thereforce, GG!=GG.
Either way, the stated problem seems badly defined.
We don't care about which came first or second, only what gender each child is.
Thus the answer, to the question given the information you have, is 50%. The only possible outcomes are girl-girl or girl-boy (where order is irrelevant.)
And this is absolutely NOT the Monty Hall Problem. I don't know why some people are making that reference. The Monty Hall Problem contains three possible choices and one is eliminated by the host, this is what makes the statistical math interesting in that problem. None of that is happening here.
Lets look a the exact wording of the question:
> a family has two children. You're told that at least one of them is a girl. What's the > probability both are girls?
We have a family with two children. Assume we don't know their gender. We'll represent them as XX.
We are told one of them is a female. So now they are represented as GX (remember GX = XG, since order doesn't matter.)
You are left with the question what is the probability that X is female? Well there are only two choices, F and M, and we are told elsewhere that the probability of having a girl is 50/50.
> Assume that the probability of having a girl or boy is 50% and that the birth order has > no effect on the probability.
So the chance of X being female is 50%. Thus the answer is 50%.
You can't say birth order doesn't matter and then use birth order to say the FM and MF are different results. The only possible results are FM and FF (since birth order is irrelevant.)
It's the fact that there are twice as many boy-girl families as there are girl-girl families in the world.
“Here's the problem: a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?”
— This is the complete statement of the problem! Everything else is an assumption that may or may not be correct. And is certainly not necessarily a complete set of underlying assumptions relevant to the problem statement.
“Assume that the probability of having a girl or boy is 50% and that the birth order has no effect on the probability. Assume the family is selected at random because they have at least one girl.”
— This is not a part of the statement of the problem! These are a subset of assumptions that can choose to accept, or not. As a modeler or decision analyst you have to make that distinction. Eh, let’s accept them, for the time being. We’ll even assume the narrator is honest, which isn’t a stated assumption.
But let’s add to that list of assumptions. The narrator telling you that one of them is a girl gets all winnings from bets on the outcome of the unknown gender of the “other child” and wants those winnings. The narrator knows that a probability tree analysis of the problem, with perhaps unwarranted assumptions of independence and prior probabilities, will lead to an assignment of 1/3 probability for the other sibling being a girl, and knows you know that result and believes you want to win. [A valid credible interpretation, not misinterpretation, of the original problem statement.]
“What do you think the probability is that both children are girls?”
— Let’s make this question more actionable. “Should you take the even odds bet on both children being girls made by the narrator?. $100 - if they are both girls, the narrator wins $100 and you lose $100; if they aren’t, you win $100 and the narrator loses $100. The narrator and you want to win the money.”
The answer to this question, which seemingly follows from the question of probabilities to be “yes”, is, in fact, “no” - under the additional valid, and quite credible, assumptions made. Because you will only be presented sets of two-girl pairs by the narrator. Let’s assume the “assumptions” are actually correct, and the families will be indeed selected at random, and in the general population there is a 50-50 mix of boys and girls. There is nothing, even in the “assumptions”, that precludes the narrator from preselecting and only presenting two-girl pairs to you, thus always winning when you believe and follow the 1/3 two-girl result.
The statement of the problem, and only the statement of the problem, underspecified as it is, leads to a whole suite of possibly correct answers. The problem is the territory, the problem statement and assumptions are the map.
None of these maps are the territory, necessarily. The probability tree answer is just as sloppy, from a decision analyst perspective, as the naive answer.
1 more comments available on Hacker News