Journal of Credit Risk

Risk.net

How magic a bullet is machine learning for credit analysis? An exploration with fintech lending data

J. Christina Wang and Charles B. Perkins

  • Consumer fintech lending has expanded greatly since the 2008 financial crisis.
  • We apply machine learning techniques to loan-level data to determine how far these methods can produce more accurate out-of-sample default predictions.
  • Other ML default studies have ignore the economic conditions faced by a borrower after origination, something we include in our model.
  • Little statistically significant evidence is found that ML methods yield unequal benefits across subgroups of borrowers.

Fintech lending to consumers has grown rapidly since the 2007–9 Great Recession. This study applies machine learning (ML) methods to loan-level data from the largest fintech lender of personal loans, to assess the extent to which these methods can produce more accurate out-of-sample default predictions relative to standard regression models, as argued by fintech lending’s advocates. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over horizons within a year. Having more data up to but not beyond a certain quantity enhances the relative predictive accuracy of the ML methods, likely because there has been data or model drift over time, so that more complex models can suffer more out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data needs to be sufficiently informative as a whole to help consumers with little or no credit history. Fintech lending to consumers has grown rapidly since the 2007–9 Great Recession. This study applies machine learning (ML) methods to loan-level data from the largest fintech lender of personal loans, to assess the extent to which these methods can produce more accurate out-of-sample default predictions relative to standard regression models, as argued by fintech lending’s advocates. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over horizons within a year. Having more data up to but not beyond a certain quantity enhances the relative predictive accuracy of the ML methods, likely because there has been data or model drift over time, so that more complex models can suffer more out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data needs to be sufficiently informative as a whole to help consumers with little or no credit history. Finally, in this data, we find little statistically significant evidence that ML methods yield unequal benefits across subgroups of borrowers defined by their risk attributes, income or where they live.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

Want to know what’s included in our free membership? Click here

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here