less than 1 minute read

What is Simpson’s paradox?

It is possible to draw two different and opposite conclusions from the same data, depending on how it is grouped. This paradox cannot be resolved by math or statistics - resolution is based on causality/external factors.

Suppose we test educational performance in schools, and compare Texas to Wisconsin. Overall, the students in Wisconsin get higher scores, leading to the conclusion that schools are better in Wisconsin than in Texas. However, when the data is subsetted by socioeconomic factors (unfortunately using race/ethnicity as a proxy), well-off socioeconomic groups in Texas did better actually did better than well-off students in Wisconsin and poor socioeconomic groups did better in Texas than similar poor groups in Wisconsin. Yet when the results are combined, Wisconsin overall outperforms Texas. This is because there are far more well off-children in Wisconsin than in Texas, and since well-off children tend to get higher test scores than poor children, the ‘weight’ of the large number of well-off children in the northern state reverses the individual comparisons.

Image

from Wikipedia

References

Youtube