Do the rich underestimate and the poor overestimate their income?

Benjamin Vincent
5 min readMay 29, 2021

An article in the Guardian by @philipoltermann, entitled German voters’ view of personal wealth causes problems for the left, reported on a research study claiming that everyone sees themselves as middle class. The core result was portrayed in this figure:

It seems obvious from this data that rich people underestimate their income and poor people overestimate their income. But is this really the case?

On twitter, I saw a post by Sam Schwarzkopf pointing to a blog post he wrote showing that the basic effect can be reproduced purely as a statistical artifact.

Alarm bells rang! This is basically the same situation that played out with the Dunning-Kruger effect a few months back. The claim made here is that people with high cognitive abilities underestimate themselves, whereas people with low cognitive abilities overestimate themselves. This had previously come under fire for being a statistical artifact in a blog post by Jonathan Jarry. But I didn’t think this was quite right, and I wrote about it in a post called The Dunning-Kruger effect proably is real.

In that post, I argued that the Dunning-Kruger effect was ‘real’ in that the empirical results are correct, even if the interpretation is not. It turns out that the basic effect of people at the high end underestimating and people at the low end over-estimating is not just a statistical artifact.

This effect can be accounted for by the ‘noise + bias’ model — this models people as having uncertainty in their cognitive abilities, income, or whatever, but also have a degree of bias. Importantly, the bias in this model is the same for all people.

A noise + bias model of a participant p where subjective assessment of ability is a noisy measurement of true ability, x. The objective measure of ability, o, is a noisy measurement of true ability, and the subjective/self-estimate of ability, s, is a noisy estimate of the true ability plus bias.

If you are interested, go read my The Dunning-Kruger effect probably is real post for more information.

In this post, I will show how the same model can account for these German income perception results. If you are not interested in the code, then skip to the end, but I’ll include it so that people can see exactly what I’ve done — feel free to get in touch on Twitter if you spot any issues.

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
deciles = [10, 20, 30, 40, 50, 60, 70, 80, 90]
groups = np.arange(1, 10 + 1)

Simulate a little world of 1 million people. Each person is somewhere on the income scale, 𝑥. We take an objective, but noisy, measurement of their true income 𝑜. And we ask each person for their own subjective assessment of their income 𝑠. This subjective self-assessment is noisy (each person has uncertainty) and has some level of bias.

def generate_data(N=1_000_000, bias=0, σo=1, σs=1):
# true ability
x = norm.rvs(size=N)
# objective measure of ability
o = x + norm.rvs(loc=0, scale=σo, size=N)
# subjective measure of ability
s = x + bias + norm.rvs(loc=0, scale=σs, size=N)
# group participants into deciles based on objective measure
q = np.digitize(o, np.percentile(o, deciles)) + 1
return (x, o, s, q)

Make a plot function to replicate that shown in the Guardian article.

def plot(ax, objective, subjective, title):

for o, s in zip(objective, subjective):
ax.plot([0, 1], [o, s], c="crimson", lw=3)
ax.plot([0], [o], "o", c="crimson", ms=7)
ax.plot([1], [s], "o", c="crimson", mfc="w", ms=7)

ax.set(
xticks=[0, 1],
xticklabels=["Actual", "Perceived"],
xlabel="Income",
yticks=np.linspace(0, 100, 11),
title=title,
)

ax.grid(axis="y")

Use the code we wrote to create a figure showing the results of 2 simulations.

  1. A simulated world of 1 million people who have uncertainty about their income, but no bias.
  2. A simulated world of 1 million people who have uncertainty about their income and where everyone underestimates their income.
fig, ax = plt.subplots(1, 2, figsize=(9, 6), sharey=True)
# NOISE ONLY MODEL =======================================
x, o, s, q = generate_data(bias=0, σo=1.5, σs=1.5)
# normalise to uniform over 0-100 scale
s = norm.cdf(s / np.std(s)) * 100
o = norm.cdf(o / np.std(o)) * 100
# Calculate means for each quartile
s_mean = [np.mean(s[q == group]) for group in groups]
o_mean = [np.mean(o[q == group]) for group in groups]
plot(ax[0], o_mean, s_mean, title="Noise")
ax[0].set(ylabel="Income decile")
# NOISE + BIAS MODEL =====================================
x, o, s, q = generate_data(bias=-0.5, σo=1.5, σs=1.5)
# normalise to uniform over 0-100 scale
s = norm.cdf(s / np.std(s)) * 100
o = norm.cdf(o / np.std(o)) * 100
# Calculate means for each quartile
s_mean = [np.median(s[q == group]) for group in groups]
o_mean = [np.median(o[q == group]) for group in groups]
ax[1] = plot(ax[1], o_mean, s_mean, title="Noise + underestimation")
# FIGURE FORMATTING ======================================
plt.figtext(
0.5,
0,
"Figure by Benjamin Vincent, @inferencelab",
ha="center",
fontsize=10,
fontdict={"color": "grey"},
)
fig.tight_layout()

Which gives our final results:

The figure on the left shows the results of a noise only model. Here, actual income is a noisy measure of people’s true income and perceived income is a person’s noisy estimate of their own income. We can see that one of the basic features of the results is captured, namely this regression toward the mean type effect.

However, there is another important feature of the income perception data in that there is a downwards trend. This aspect of the data is captured by the ‘noise + underestimation’ model shown on the right. This model basically states that peoples’ estimates of their place in the income distribution are noisy and that everyone underestimates their income.

So the immediately appealing interpretation of the data, made in the Dunning-Kruger effect, and in the income perception study, is that people high on the scale underestimate and people low on the scale overestimate.

But the ‘noise + underestimation’ model shows that the core features of the data can be explained by people having uncertainty about where they are in the income distribution and that everybody underestimates where they are in the income distribution.

Take home messages?

  • It might look like rich people underestimate their income and that poor people overestimate their income.
  • But if you simulate a world where everyone has uncertainty about their income and where everyone underestimates their income, then you can get results like what were reported in the study.
  • The data is consistent with a model where people have uncertainty about their relative income but everyone underestimates their wealth.
  • It might be the case that poor people overestimate their income but a solid case for this has not been made.

Niche academic caveats

  • You could imagine a more complex noise + bias model where bias varies as a function of true income. This kind of model would be able to capture the initial proposal (rich people underestimate, poor people overestimate). I am not claiming that that model is not ‘true’ — but to make a convincing argument for that model, you would have to quantitatively compare all of these models and find that the more complex model fits the data better even after accounting for model complexity.

Code

The GitHub repository of the code is available here: drbenvincent/middle-class.

--

--

Benjamin Vincent

Data scientist. Currently focusing on Bayesian inference and causal reasoning. Side interest in MMT.