Callibration

There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. Calibration can mean

a reverse process to regression, where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable;^[1]
procedures in statistical classification to determine class membership probabilities which assess the uncertainty of a given new observation belonging to each of the already established classes.

In addition, calibration is used in statistics with the usual general meaning of calibration. For example, model calibration can be also used to refer to Bayesian inference about the value of a model's parameters, given some data set, or more generally to any type of fitting of a statistical model. As Philip Dawid puts it, "a forecaster is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent."^[2]

In classification

Calibration in classification means turning transform classifier scores into class membership probabilities. An overview of calibration methods for two-class and multi-class classification tasks is given by Gebel (2009).^[3] A classifier might separate the classes well, but be poorly calibrated, meaning that the estimated class probabilities are far from the true class probabilities. In this case, a calibration step may help improve the estimated probabilities. A variety of metrics exist that are aimed to measure the extent to which a classifier produces well-calibrated probabilities. Foundational work includes the Expected Calibration Error (ECE).^[4] Into the 2020s, variants include the Adaptive Calibration Error (ACE) and the Test-based Calibration Error (TCE), which address limitations of the ECE metric that may arise when classifier scores concentrate on narrow subset of the [0,1] range.^[5]^[6]

A 2020s advancement in calibration assessment is the introduction of the Estimated Calibration Index (ECI).^[7] The ECI extends the concepts of the Expected Calibration Error (ECE) to provide a more nuanced measure of a model's calibration, particularly addressing overconfidence and underconfidence tendencies. Originally formulated for binary settings, the ECI has been adapted for multiclass settings, offering both local and global insights into model calibration. This framework aims to overcome some of the theoretical and interpretative limitations of existing calibration metrics. Through a series of experiments, Famiglini et al. demonstrate the framework's effectiveness in delivering a more accurate understanding of model calibration levels and discuss strategies for mitigating biases in calibration assessment. An online tool has been proposed to compute both ECE and ECI.^[8] The following univariate calibration methods exist for transforming classifier scores into class membership probabilities in the two-class case:

Assignment value approach, see Garczarek (2002)^[9]
Bayes approach, see Bennett (2002)^[10]
Isotonic regression, see Zadrozny and Elkan (2002)^[11]
Platt scaling (a form of logistic regression), see Lewis and Gale (1994)^[12] and Platt (1999)^[13]
Bayesian Binning into Quantiles (BBQ) calibration, see Naeini, Cooper, Hauskrecht (2015)^[14]
Beta calibration, see Kull, Filho, Flach (2017)^[15]

In probability prediction and forecasting

In prediction and forecasting, a Brier score is sometimes used to assess prediction accuracy of a set of predictions, specifically that the magnitude of the assigned probabilities track the relative frequency of the observed outcomes. Philip E. Tetlock employs the term "calibration" in this sense in his 2015 book Superforecasting.^[16] This differs from accuracy and precision. For example, as expressed by Daniel Kahneman, "if you give all events that happen a probability of .6 and all the events that don't happen a probability of .4, your calibration is perfect but your discrimination is miserable".^[16] In meteorology, in particular, as concerns weather forecasting, a related mode of assessment is known as forecast skill.

In regression

The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable.^[17]^[18]^[19] This can be known as "inverse regression";^[20] there is also sliced inverse regression. The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities in the case with classes count greater than two:

Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998)^[21]
Dirichlet calibration, see Gebel (2009)^[3]

Example

One example is that of dating objects, using observable evidence such as tree rings for dendrochronology or carbon-14 for radiometric dating. The observation is caused by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation at some distance from the known results.

References

^ Cook, Ian; Upton, Graham (2006). Oxford Dictionary of Statistics. Oxford: Oxford University Press. ISBN 978-0-19-954145-4.
^ Dawid, A. P (1982). "The Well-Calibrated Bayesian". Journal of the American Statistical Association. 77 (379): 605–610. doi:10.1080/01621459.1982.10477856.
^ ^a ^b Gebel, Martin (2009). Multivariate calibration of classifier scores into the probability space (PDF) (PhD thesis). University of Dortmund.
^ M.P. Naeini, G. Cooper, and M. Hauskrecht, Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2015.
^ J. Nixon, M.W. Dusenberry, L. Zhang, G. Jerfel, & D. Tran. Measuring Calibration in Deep Learning. In: CVPR workshops (Vol. 2, No. 7), 2019.
^ T. Matsubara, N. Tax, R. Mudd, & I. Guy. TCE: A Test-Based Approach to Measuring Calibration Error. In: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, 2023.
^ Famiglini, Lorenzo, Andrea Campagner, and Federico Cabitza. "Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use." ECAI 2023. IOS Press, 2023. 645-652. Doi 10.3233/FAIA230327
^ Famiglini, Lorenzo; Campagner, Andrea; Cabitza, Federico (2023), "Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use", ECAI 2023, IOS Press, pp. 645–652, doi:10.3233/faia230327, hdl:10281/456604, retrieved 25 March 2024
^ U. M. Garczarek "[1] Archived 2004-11-23 at the Wayback Machine," Classification Rules in Standardized Partition Spaces, Dissertation, Universität Dortmund, 2002
^ P. N. Bennett, Using asymmetric distributions to improve text classifier probability estimates: A comparison of new and standard parametric methods, Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.
^ B. Zadrozny and C. Elkan, Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining, 694–699, Edmonton, ACM Press, 2002.
^ D. D. Lewis and W. A. Gale, A Sequential Algorithm for Training Text classifiers. In: W. B. Croft and C. J. van Rijsbergen (eds.), Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), 3–12. New York, Springer-Verlag, 1994.
^ J. C. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: A. J. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans (eds.), Advances in Large Margin Classiers, 61–74. Cambridge, MIT Press, 1999.
^ Naeini MP, Cooper GF, Hauskrecht M. Obtaining Well Calibrated Probabilities Using Bayesian Binning. Proceedings of the . AAAI Conference on Artificial Intelligence AAAI Conference on Artificial Intelligence. 2015;2015:2901-2907.
^ Meelis Kull, Telmo Silva Filho, Peter Flach; Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:623-631, 2017.
^ ^a ^b "Edge Master Class 2015: A Short Course in Superforecasting, Class II". edge.org. Edge Foundation. 24 August 2015. Retrieved 13 April 2018. Calibration is when I say there's a 70 percent likelihood of something happening, things happen 70 percent of time.
^ Brown, P.J. (1994) Measurement, Regression and Calibration, OUP. ISBN 0-19-852245-2
^ Ng, K. H., Pooi, A. H. (2008) "Calibration Intervals in Linear Regression Models", Communications in Statistics - Theory and Methods, 37 (11), 1688–1696. [2]
^ Hardin, J. W., Schmiediche, H., Carroll, R. J. (2003) "The regression-calibration method for fitting generalized linear models with additive measurement error", Stata Journal, 3 (4), 361–372. link, pdf
^ Draper, N.L., Smith, H. (1998) Applied Regression analysis, 3rd Edition, Wiley. ISBN 0-471-17082-8
^ T. Hastie and R. Tibshirani, "[3]," Classification by pairwise coupling. In: M. I. Jordan, M. J. Kearns and S. A. Solla (eds.), Advances in Neural Information Processing Systems, volume 10, Cambridge, MIT Press, 1998.

[1] Cook, Ian; Upton, Graham (2006). Oxford Dictionary of Statistics. Oxford: Oxford University Press. ISBN 978-0-19-954145-4.

[2] Dawid, A. P (1982). "The Well-Calibrated Bayesian". Journal of the American Statistical Association. 77 (379): 605–610. doi:10.1080/01621459.1982.10477856.

[Gebel2009-3] Gebel, Martin (2009). Multivariate calibration of classifier scores into the probability space (PDF) (PhD thesis). University of Dortmund.

[4] M.P. Naeini, G. Cooper, and M. Hauskrecht, Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2015.

[5] J. Nixon, M.W. Dusenberry, L. Zhang, G. Jerfel, & D. Tran. Measuring Calibration in Deep Learning. In: CVPR workshops (Vol. 2, No. 7), 2019.

[6] T. Matsubara, N. Tax, R. Mudd, & I. Guy. TCE: A Test-Based Approach to Measuring Calibration Error. In: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, 2023.

[7] Famiglini, Lorenzo, Andrea Campagner, and Federico Cabitza. "Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use." ECAI 2023. IOS Press, 2023. 645-652. Doi 10.3233/FAIA230327

[8] Famiglini, Lorenzo; Campagner, Andrea; Cabitza, Federico (2023), "Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use", ECAI 2023, IOS Press, pp. 645–652, doi:10.3233/faia230327, hdl:10281/456604, retrieved 25 March 2024

[9] U. M. Garczarek "[1] Archived 2004-11-23 at the Wayback Machine," Classification Rules in Standardized Partition Spaces, Dissertation, Universität Dortmund, 2002

[10] P. N. Bennett, Using asymmetric distributions to improve text classifier probability estimates: A comparison of new and standard parametric methods, Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.

[11] B. Zadrozny and C. Elkan, Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining, 694–699, Edmonton, ACM Press, 2002.

[12] D. D. Lewis and W. A. Gale, A Sequential Algorithm for Training Text classifiers. In: W. B. Croft and C. J. van Rijsbergen (eds.), Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), 3–12. New York, Springer-Verlag, 1994.

[13] J. C. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: A. J. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans (eds.), Advances in Large Margin Classiers, 61–74. Cambridge, MIT Press, 1999.

[14] Naeini MP, Cooper GF, Hauskrecht M. Obtaining Well Calibrated Probabilities Using Bayesian Binning. Proceedings of the . AAAI Conference on Artificial Intelligence AAAI Conference on Artificial Intelligence. 2015;2015:2901-2907.

[15] Meelis Kull, Telmo Silva Filho, Peter Flach; Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:623-631, 2017.

[Edge-II-16] "Edge Master Class 2015: A Short Course in Superforecasting, Class II". edge.org. Edge Foundation. 24 August 2015. Retrieved 13 April 2018. Calibration is when I say there's a 70 percent likelihood of something happening, things happen 70 percent of time.

[17] Brown, P.J. (1994) Measurement, Regression and Calibration, OUP. ISBN 0-19-852245-2

[18] Ng, K. H., Pooi, A. H. (2008) "Calibration Intervals in Linear Regression Models", Communications in Statistics - Theory and Methods, 37 (11), 1688–1696. [2]

[19] Hardin, J. W., Schmiediche, H., Carroll, R. J. (2003) "The regression-calibration method for fitting generalized linear models with additive measurement error", Stata Journal, 3 (4), 361–372. link, pdf

[20] Draper, N.L., Smith, H. (1998) Applied Regression analysis, 3rd Edition, Wiley. ISBN 0-471-17082-8

[21] T. Hastie and R. Tibshirani, "[3]," Classification by pairwise coupling. In: M. I. Jordan, M. J. Kearns and S. A. Solla (eds.), Advances in Neural Information Processing Systems, volume 10, Cambridge, MIT Press, 1998.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_3TWNWEDH70	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_203667330_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
rmuid	1 year	This cookie is provided by Linksynergy. It is used for advertising purposes. This cookie is used for create and storing unique user ID.
WMF-Last-Access	1 month 20 hours 4 minutes	This cookie is used to calculate unique devices accessing the website.
WMF-Last-Access-Global	1 month 20 hours 4 minutes	This cookie is used to count how many times a website has been visited by different visitor. This is done by assigning the visitor an ID.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
Captcha	session	No description available.
GeoIP		No description
GoogleAdServingTest	session	No description
S	1 hour	No description available.

codefinance.training

Callibration

Callibration

In classification

In probability prediction and forecasting

In regression

Example

See also

References