Journal of Credit Risk
ISSN:
1755-9723 (online)
Editor-in-chief: Linda Allen and Jens Hilscher

How magic a bullet is machine learning for credit analysis? An exploration with fintech lending data
Need to know
- Consumer fintech lending has expanded greatly since the 2008 financial crisis.
- We apply machine learning techniques to loan-level data to determine how far these methods can produce more accurate out-of-sample default predictions.
- Other ML default studies have ignore the economic conditions faced by a borrower after origination, something we include in our model.
- Little statistically significant evidence is found that ML methods yield unequal benefits across subgroups of borrowers.
Abstract
Fintech lending to consumers has grown rapidly since the 2007–9 Great Recession. This study applies machine learning (ML) methods to loan-level data from the largest fintech lender of personal loans, to assess the extent to which these methods can produce more accurate out-of-sample default predictions relative to standard regression models, as argued by fintech lending’s advocates. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over horizons within a year. Having more data up to but not beyond a certain quantity enhances the relative predictive accuracy of the ML methods, likely because there has been data or model drift over time, so that more complex models can suffer more out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data needs to be sufficiently informative as a whole to help consumers with little or no credit history. Fintech lending to consumers has grown rapidly since the 2007–9 Great Recession. This study applies machine learning (ML) methods to loan-level data from the largest fintech lender of personal loans, to assess the extent to which these methods can produce more accurate out-of-sample default predictions relative to standard regression models, as argued by fintech lending’s advocates. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over horizons within a year. Having more data up to but not beyond a certain quantity enhances the relative predictive accuracy of the ML methods, likely because there has been data or model drift over time, so that more complex models can suffer more out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data needs to be sufficiently informative as a whole to help consumers with little or no credit history. Finally, in this data, we find little statistically significant evidence that ML methods yield unequal benefits across subgroups of borrowers defined by their risk attributes, income or where they live.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net