3.2 Calibration of Credence Levels

From Sense & Sensibility & Science
Topic Icon - 3.2 Calibration of Credence Levels.png

How do we know what the appropriate degree of confidence for any statement of fact should be? Even experts often suffer from overconfidence. We introduce practical techniques to calibrate appropriate levels of confidence as well as psychological attitudes that motivate one to improve one's calibration.

The Lesson in Context

This lesson continues the previous lesson, 3.1 Probabilistic Reasoning, and addresses the importance of the calibration of credence levels. We illustrate with some real-life examples that people, even experts, often exhibit overconfidence. By resolving the Future Predictions activity from the previous lesson, we show the students how to improve their own calibration. Finally, with the Actively Open-minded Thinking and Growth Mindset surveys, we introduce psychological attitudes that can help motivate one to improve their calibration.

Relation to Other Lessons

Earlier Lessons

1.2 Shared Reality and ModelingTopic Icon - 1.2 Shared Reality and Modeling.png
  • Raft vs. pyramid: Having concrete methods to assess and calibrate credence levels allows scientists to comfortably discuss their imperfect results without being overly invested in being right. This also leaves open the possibility that scientific claims can be revised or overturned in light of new evidence. This is a strength of the scientific method, rather than a weakness.
3.1 Probabilistic ReasoningTopic Icon - 3.1 Probabilistic Reasoning.png
  • The current lesson refines the previous one by introducing concrete ways to assess one's credence calibration and to improve upon them.

Later Lessons

5.1 False Positives and NegativesTopic Icon - 5.1 False Positives and Negatives.png
  • With accurately calibrated credence levels, decisions can be made on whether to act upon a prediction by considering the seriousness of its consequences. For example, a city should invest heavily in storm protection even when there is a 10% confidence that a strong hurricane will hit the city.
5.2 Scientific OptimismTopic Icon - 5.2 Scientific Optimism.png
  • Actively Open-minded Thinking and Growth Mindset are ways to encourage self-improvement and iterative progress, attitudes necessary to achieve scientific progress.
7.1 Causation, Blame, and PolicyTopic Icon - 7.1 Causation, Blame, and Policy.png
  • Weakly related: Since the calibration of credence levels is a statistical quantity, it is impossible and meaningless to discuss the calibration of a singular prediction. It is only meaningful to discuss the calibration of an aggregate of many predictions in a certain credence range.
9.1 HeuristicsTopic Icon - 9.1 Heuristics.png
  • Representativeness (and base-rate neglect) and availability heuristic may be a source of over- or under-confidence in estimating the likelihood of future events. Actively open-minded thinking is correlated with less use of biasing heuristics.
10.1 Confirmation BiasTopic Icon - 10.1 Confirmation Bias.png
  • Actively open-minded thinking is a habit of thinking that helps people minimize confirmation bias.

Takeaways

After this lesson, students should

  1. Be wary of high levels of confidence.
  2. Understand the different ways in which scientists in different fields discuss credence levels (e.g. 95% confidence interval, error bars).
  3. Appreciate that one can improve on the calibration of their credence levels, and one should strive to reach an accurate calibration.

Confidence Interval

A range of values within which we believe the true value lies, with some credence level.

Confidence interval, when describing an instrumental measurement, corresponds to the statistical uncertainty.

When one presents a scientific measurement, they are usually not presenting a specific value. They're presenting a range that the value is within and the likelihood that the true value is within that range.

Different fields have different standards for confidence intervals. Physicists typically choose confidence intervals within which they are 68% sure the true value lies. Psychologists typically use 95%.

There are other related terms like "standard deviation," "σ," and "standard error." Try to avoid this jargon. If students ask about these terms then discuss it outside of class and be mindful of students that haven't had a statistics class.


Error Bars

A visual representation of the confidence interval on a graph.

Error bars are typically drawn around a data point which is often the middle of the confidence interval.


Actively Open-minded Thinking (AOT)

A thinking style which emphasizes good reasoning independently of one's own beliefs by looking at issues from multiple perspectives, actively searching out ideas on both sides. It predicts more accurate calibration as well as the ability to evaluate argument quality objectively.

Growth Mindset

A mindset where people believe "intelligence can be developed" and their abilities can be enhanced through learning. This also predicts more accurate calibration of credence levels.

Lawyer Calibration

Lawyer calibration curve.
Lawyers have a wide range of predictions (20%-100%) for how likely they are to win any given case. However, the actual results show a much narrower band (40%-70%). Really, the outcome is more of a toss-up. (Source)

Nurse Calibration

Nurse calibration curve.
Calibration curves are shown for experienced nurses as well as students training to become nurses. In both cases, they appear to be overconfident when outcomes are more likely and underconfident when outcomes are less likely. The nurses did not seem to become better calibrated with time. (Source)

Note

The confidence values are all greater than 50%. Any prediction whose confidence is less than 50% can be rephrased as a prediction for the opposite with a confidence greater than 50%. (e.g. I'm 30% confident it will rain tomorrow means I think it will not rain tomorrow with 70% confidence.)


Weather Forecaster Calibration

It is possible (and expected) for singular election forecasts to turn out to be wrong some of the time. It is fine as long as the predictions within that credence range are overall well calibrated. It is also impossible to calibrate the credence of a singular prediction, it is only possible for an aggregate of predictions. (Source)

Vague Verbiage and Words of Estimative Probability



He seems super confident, and she said she was only 85% sure, so we should trust him over her.

It is more important to have the self-awareness of how often they are wrong, than to always insist they are right. If possible, use the outcomes of their past predictions to evaluate the calibration of their confidence before placing trust in them.

Am I well calibrated in this particular prediction?

The question doesn't really make sense. You can't have a calibration for a single prediction. A calibration level is only meaningfully defined over a collection of predictions for which you can see how many came ultimately true.

Error bars say that the true value must lie within that range.

This is incorrect. The error bars always have some credence level associated with them that the value is within that range.

Additional Content

You must be logged in to see this content.