Efficient Depression Detection based on Encoder-only Transformer Architecture: A Review

Authors

  • Shreeya Mishra Department of Computer Science, Hansraj College, University of Delhi Author
  • Baljeet Kaur Department of Computer Science, Hansraj College, University of Delhi Author

DOI:

https://doi.org/10.61113/impact.V2I1.1250

Keywords:

Encoder-only Transformers, Deep Learning, Depression detection, clinical interviews, social media platforms

Abstract

Depression is a major depressive disorder that is defined as a common, yet serious, mental disorder causing persistent sadness and loss of interest for a longer period of time. This eventually affects how an individual feels, behaves, understands, and handles day-to-day life activities and results in emotional and physical drain of energy, and may lead to suicidal thoughts. Early detection of depressive symptoms is essential for timely intervention. However, traditional diagnostic methods are often subjective and resource-intensive. This study aims to analyse the different successful techniques used in depression detection using Artificial Intelligence. Clinical interview-based datasets, such as Daic-woz, E-Daic, and from various social media platforms such as Twitter, Reddit, etc., are considered for this review study. The review comprises a comprehensive study of traditional Machine Learning techniques, such as Support Vector Machines, Random Forests, Logistic Regression, etc., that rely on handcrafted linguistic features. These techniques show limited capability in capturing contextual and emotional nuances. However, feature representations portrayed by Deep Learning techniques such as Recurrent Neural Networks, Long Short-Term Memory, and Gated Recurrent Units have improved the performance metrics. But they too struggle in capturing contextual semantics present in natural language. These limitations have motivated this study to inspect the potential of Transformer-based architectures. The gradual shift to Transformers is due to their self-attention mechanism and ability to capture global contextual relationships within the text, which further forms the backbone of modern Natural Language Processing. The study presents an in-depth analysis of the encoder-only architectures, as they utilize bidirectional self-attention that helps in identifying subtle linguistic markers, understanding contextual embeddings, and capturing deep semantic meaning. Through this review, a comprehensive chronological advent of effective Transformer techniques and their performance has been assimilated.

Published

2026-02-06