The Dunning-Kruger effect probably is real
The Dunning-Kruger effect refers to the idea that people with low ability levels overestimate their ability by a lot, and people with high ability levels underestimate their ability by a little (Kruger & Dunning, 1999). This conclusion is largely based on quantile plots where objectively measured ability and subjectively estimated ability are plotted as a function of objective ability quartile.
The fact that the self-perceived ability of those with low ability is higher than expected based on actual test scores has been used to argue that low ability participants over-estimate their ability by a lot. Similarly, the fact that the perceived ability of those with high ability is lower than expected based on test scores has been used to argue that high ability participants underestimate their ability by a little.
Recently however, simulations have shown that this basic result can be generated when no over- or under-estimation effect exists (Ackerman, Beier, & Bowen, 2002; Nuhfer, Cogan, Fleisher, Gaze, & Wirth, 2016). This was recently echoed in a blog post by Jonathan Jarry. In that blog post, the following plot (created by Patrick McKnight) was used to argue that the Dunning-Kruger effect was not real.
If the claims of over-and under-estimation biases are based upon quartile plots, and this basic pattern of results can be generated from simulated null models with no bias, then this is worrying. It suggests that the Dunning-Kruger effect is artifactual, being the result of measurement error alone. But is this the case?
Not being satisfied with someone else’s plot (with no code to inspect) I thought I’d create my own data generating model. This model can be considered as a noise + bias model where each participant has a true ability, x, and their subjective ability score is a noisy estimate of their true ability + some bias.
We can try to replicate the figure above (also see Nuhfer et al 2016) by setting bias=0
. Let’s code this up in Python! First, let’s import the toolboxes we will need.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
We can write a function to generate simulated data like this:
def generate_data(N=1_000_000, bias=0, σo=1, σs=1):
# true ability
x = norm.rvs(size=N)
# objective measure of ability
o = x + norm.rvs(loc=0, scale=σo, size=N)
# subjective measure of ability
s = x + bias + norm.rvs(loc=0, scale=σs, size=N)
# group participants into quartiles based on objective measure
q = np.digitize(o, np.percentile(o, [25, 50, 75])) + 1
return (x, o, s, q)
And we can define some plotting functions:
def plot_subjective_ability(o, s, q, ax):
# Calculate means for each quartile
s_mean = [np.mean(s[q == group]) for group in [1, 2, 3, 4]]
o_mean = [np.mean(o[q == group]) for group in [1, 2, 3, 4]]
# Convert to percentiles, based on the observed score
s_mean = norm.cdf(s_mean, loc=0, scale=np.std(o_mean)) * 100
ax.plot([1, 2, 3, 4], s_mean, "o-", lw=6, ms=12,
label="subjective ability")
def format_quartile_plot(ax=None):
ax.plot([1, 4], [12.5, 87.5], "k-", label="identity line")
ax.set(
xlabel="Quartile of observed performance",
ylabel="Percentile estimate",
xticks=[1, 2, 3, 4],
yticks=np.linspace(0, 100, 11),
ylim=[0, 100],
)
Now we are in a position to run our first simulation. Here we will simulate an experiment of 1 million participants, each of which has a true ability drawn from a standard normal distribution.
We can generate the plot below using the following code. Again, note that by setting bias=0
, the model becomes a noise-only model with no bias to be seen.
fig, ax = plt.subplots(figsize=(6, 6))
x, o, s, q = generate_data(bias=0, σo=2, σs=2)
plot_subjective_ability(o, s, q, ax)
format_quartile_plot(ax)
So we have basically replicated the simulation results of Nuhfer et al (2016), and those presented in Jonathan Jarry’s blog post. What does it tell us?
If the over- and under-estimates from the subjective ability curve over each ability quartile is used as a basis to argue for over and under-estimation bias then this is indeed quite worrying. If that pattern of results can be generated by a noise-only model, then I agree that this result (taken on it’s own) does raise serious questions about the origin of the Dunning-Kurger effect.
However, we can see that the no-bias model line does not capture the data as seen in Kruger & Dunning (1999). Comparing to the first plot in this post, we can see that the whole curve is shifted downwards.
What happens when we consider adding in some estimation to the bias and evaluate our noise+bias model? This time we will run 3 simulations with bias=[-1, 0, +1]
akin to running 3 experiments with 3 different levels of bias. Note that the values of bias
are in units of standard deviations.
fig, ax = plt.subplots(figsize=(6, 6))
# Define parameters, each tuple is (bias, σo, σs)
parameter_set = [(0, 2, 2), (+1, 2, 2), (-1, 2, 2)]
for θ in parameter_set:
bias, σo, σs = θ
x, o, s, q = generate_data(bias=bias, σo=σo, σs=σs)
plot_subjective_ability(o, s, q, ax)
format_quartile_plot(ax)
Well this is very interesting. The blue lines corresponds to bias=0
and is identical to the previous simulated experiment. However the orange line shows a simulated experiment where all participants systematically overestimate their ability by 1 std, bias=1
. See how this result is now much closer to the empirically observed data seen in Krugger & Dunning (1999)? This suggests that the basic Dunning-Kruger effect is better accounted for by a noise + bias model than a noise only model.
The green curve corresponds to a situation where all participants systematically under-estimate their ability by 1 std, bias=-1
.
After building this model, I became aware that I’d touched on an argument made by Burson et al (2006). They proposed a noise+bias model akin to the one I have presented. They go further in a number of ways however. Firstly, they also explore the notion that the estimation noise of one’s subjective ability may increase as one’s true ability decreases. This amounts to suggesting that people with low ability have higher uncertainty about their own true ability. Secondly, Burson et al (2006) manipulate task difficulty in an attempt to manipulate people’s estimation biases. They do in fact empirically find that harder tasks shift the curve downwards, such that the majority of people now underestimate their abilities. This is powerful evidence in favour of their noise + bias explanation of the Dunning-Kruger effect.
Summary:
- A noise only model, with no bias, is capable of generating systematic over- and under-estimation of one’s abilities even though there is no systematic bias present in the model.
- This alone may make one believe that the Dunning-Kruger effect is artifactual, a result of measurement error alone. However this is incorrect.
- The empirical observations are better accounted for by a noise + bias model as I presented above. But serious readers should check out Burson et al (2006) for a much more in-depth treatment of this topic.
- Take home message — The Dunning-Krugger effect probably is real, but you probably want to base your interpretations of psychological explanations upon the noise + bias model.
In terms of recommendations, it is pretty clear that interpreting psychological processes from the Dunning-Kruger style quartile plots is a bad idea. Instead, I would strongly advocate for formal model building, parameter estimation methods, and model comparison methods. As a starting point — you can try to simulate your data and take it from there, it’s pretty fun to do.
Dr Ben Vincent is a Lecturer at University of Dundee, Scotland, UK, who tweets at https://twitter.com/inferencelab.
References
Ackerman, P. L., Beier, M. E., & Bowen, K. (2002). What we really know about our abilities and our knowledge. Personality and Individual Differences , 33 (4), 587–605.
Burson, K. A., Larrick, R. P., & Klayman, J. (2006). Skilled or unskilled, but still unaware of it: How perceptions of difficulty drive miscalibration in relative comparisons. Journal of Personality and Social Psychology, 90(1), 60–77.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134.
Nuhfer, E., Cogan, C., Fleisher, S., Gaze, E., & Wirth, K. (2016). Random Numb er Simulations Reveal How Random Noise Aects the Measurements and Graphical Portrayals of Self-Assessed Comp etency. Numeracy , 9 (1).