Attractive Parents Have More Female Offspring
Kanazawa (2007) predicts from evolutionary psychology that attractive parents will have more daughters, and finds evidence supporting his prediction:
The generalized Trivers–Willard hypothesis . . . proposes that parents who possess any heritable trait which increases the male reproductive success at a greater rate than female reproductive success in a given environment will have a higher-than-expected offspring sex ratio, and parents who possess any heritable trait which increases the female reproductive success at a greater rate than male reproductive success in a given environment will have a lower-than-expected offspring sex ratio. One heritable trait which increases the reproductive success of daughters much more than that of sons is physical attractiveness. I therefore predict that physically attractive parents have a lower-than-expected offspring sex ratio (more daughters). Further, if beautiful parents have more daughters and physical attractiveness is heritable, then, over evolutionary history, women should gradually become more attractive than men. The analysis of the National Longitudinal Study of Adolescent Health (Add Health) confirm both of these hypotheses. Very attractive individuals are 26% less likely to have a son, and women are significantly more physically attractive than men in the representative American sample.
Update: Gelman (2006), a statistician, casts serious doubt on Kanazawa’s interpretation of the data:
Physical attractiveness in the survey used by this paper was measured on a five-point scale, from “very unattractive” to “very attractive”. The key result was that 44% of the children of surveyed parents in category 5 (“very attractive”) are boys, as compared to 52% of children born to parents from the other four attractiveness categories. With a sample size of about 3000, this difference is statistically significant (2.44 standard errors away from zero).
In interpreting this statement of statistical significance, however, we should consider the arbitrariness of picking out category 5 and comparing it to 1–4. Why not compare 4 and 5 (“attractive” or “very attractive”) to 1–3? Given the many comparisons that could be done, it is not such a surprise that one of them is statistically significant at the 5% level.
Perhaps the most natural analysis of these data would be a regression of the proportion of boys on the numerical attractiveness measure. Using the data in Fig. 1 of the paper, the estimated regression coefficient is −1.5 with a standard error of 1.4—thus, not statistically significant. (Weighting by the approximate number of parents in each category does not appreciably change this result).
I have little to say about the difficulties of measuring attractiveness except that, according to the paper, interviewers in the survey seem to have assessed the attractiveness of each participant three times over a period of several years. I would recommend using the average of these three judgments as a combined attractiveness measure. General advice is that if there is an effect, it should show up more clearly if the x-variable is measured more precisely. I do not see a good reason to use just one of the three measures.
One way to summarize the multiple comparisons criticism is to consider the number of possible analyses that could have been considered by Kanazawa in comparing different levels of attractiveness. In addition to the linear regression, there is the comparison of category 1 to categories 2–5, the comparison of 1–2 to 3–5, the comparison of 1–3 to 4–5, and the comparison of 1–4 to 5. Any of these, if statistically significant, could have been chosen to be reported. In addition, with three waves of data, the research could report the results from wave 1, wave 2, wave 3, or the average of all three waves. This comes to 5×4=20 possible comparisons. A simple Bonferroni correction multiplies the significance level (p-value) by the number of potential comparisons, so that to achieve statistical significance at the 5% level, one would need an individual comparison with p-value of 0.05/20=0.0025. In comparison, Kanazawa’s reported result is 2.44 standard errors away from zero, which corresponds to a p-value of 0.015 (that is, 1.5%), which would be statistically significant on its own but not as one of 20 possible comparisons. In short, the observed result in this study could easily occur by chance, given the large number of potential comparisons that could be made with these data. (In fact, this p-value is not even statistically significant as one of five comparisons, if we were to ignore the possibility of using data from either of the three waves). . .
[Kanazawa's paper] includes a mistake in interpreting a logistic regression coefficient. The difference reported in this study was 44% compared to 52%—the most attractive parents in the study had an 8% higher rate of girls. One could also say that the proportion of girls was 0.08/0.52=15% higher among the most attractive parents. But the paper reports that “very attractive respondents are about 26% less likely to have a son as the first child”. This appears to be based on an incorrect interpretation of a logistic regression of sex of child on an indicator for whether the parent was judged to be very attractive. The logistic regression coefficient was −0.31. Since the probabilities are near 0.5, the correct way to quickly interpret the coefficient is to divide it by 4: −0.31/4=−0.08, thus a difference of 8 percentage points (which is what we saw above). For some reason, Kanazawa exponentiated the coefficient: exp(−0.31)=0.74, then took 0.74−1=−0.26 to get a result of 26%, which cannot be interpreted in the way suggested in the paper. 26% can be interpreted in terms of the odds ratio (i.e., p/(1−p), where p is the probability), but the statement of “26% less likely” is an incorrect summary of the regression (setting aside the multiple comparisons problems discussed in point 2 of this letter). This is particularly unfortunate since 26% was the number reported in the press.
