Skip to main content

Journal of Risk Model Validation

Risk.net

A three-stage fusion model for predicting financial distress considering semantic and sentiment information

Jiaming Liu and Bo Yuan

  • A multi-dimensional feature fusion analysis framework is proposed, integrating financial indicators, textual semantics, and sentiment intensity features.
  • Sentiment features are quantified, and deep semantic representations of text are extracted using the DUTSD dictionary and deep learning techniques.
  • A three-stage heterogeneous stacking model (HeSM) is designed, achieving significant improvements in predictive accuracy and stability through probabilistic fusion and dynamic selection mechanisms.

In recent years the role of analyzing the management discussion and analysis (MD&A) text of listed companies in financial distress prediction models has gradually gained attention. This paper, by integrating text analysis and machine learning techniques, reveals the financial information hidden in MD&A text and accurately captures the emotional tendency of the text through a sentiment analysis lexicon, providing a more comprehensive and detailed method for predicting a company’s financial condition. This study explores the effect of integrating financial features, semantic features and sentiment features on the ability to predict the financial distress of listed companies. To do this we propose an innovative three-phase fusion model. First, semantic features are extracted from the MD&A sections of the annual reports of listed companies using deep learning techniques, and sentiment features are derived from the MD&A text content based on a sentiment dictionary. Then, initial prediction models are constructed separately based on financial, semantic and sentiment features. Finally, by introducing a stacking ensemble strategy, a heterogeneous stacking model is constructed by integrating these models to improve prediction accuracy. The research results indicate that financial features play a critical role in prediction models, having a decisive impact on prediction accuracy. The introduction of semantic and sentiment features significantly enhances the model’s predictive performance. Further, by comparing the application of different algorithms (naive Bayes, random forest, extreme gradient boosting, logistic regression and ridge regression) in the model, we find that the adoption of a heterogeneous stacking model not only enhances the overall prediction accuracy but also improves the model’s generalizability.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

Want to know what’s included in our free membership? Click here

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here