

Note that additional information such as delay days are not observed for default cases, so the exact value of the default potential is only known below the default threshold (see Figure 1).įigure 1: (Left) binary default classification (right) linear regression on delay days for non-default cases In the model, a default occurs if the default potential exceeds a so-called default threshold. From an economic point of view, the Tobit model learns the default potential of a company, as represented for example by the numbers of delay days. To combine the binary default data with continuous auxiliary data, we build on top of the Tobit model, a commonly used censored regression model. Yet, traditional classification methods typically neglect this additional information and only consider binary default events. Intuitively, such auxiliary data is closely related to default events, since loans with large delays in repayment are likely “closer” to defaults than loans without any repayments. In many cases, financial institutions collect additional information about the performance of a company or loan, such as delays in repayment or changes in ratings.

Using auxiliary data for default prediction This poses a problem for machine learning models, as the number of observations of defaults might be too small to identify patterns in the data and use them to accurately predict future defaults. typically only a small fraction of observations in a given dataset are defaults. Brown, 2012).Ī common problem in default prediction is class imbalance between defaults and non-defaults i.e. More recently, to account for nonlinear dependencies between features and default events, different popular machine learning models, such as neural networks, classification trees, and ensemble methods have been applied to default prediction (see e.g. Early proposals for statistical default prediction focused primarily on linear classification models such as linear discriminant analysis ( Altman, 1968) or logistic regression ( Ohlson, 1980). Predicting company defaultsĭefault prediction has been of major interest to both researchers and practitioners in the financial sector for almost a century. This blog post gives a brief introduction to using machine learning for default prediction and summarizes the results of our paper Grabit: Gradient tree-boosted Tobit models for default prediction, published in the Journal of Banking & Finance.
