Bangla TTS Performance Evaluation: A Benchmark Study on Synthesized Speech Quality and Intelligibility

A Benchmark Study on Synthesized Speech Quality and Intelligibility

Authors

  • Mehadi Hasan Department of Computer Science and Engineering, Faculty of Engineering and Technology University of Dhaka, Dhaka, Bangladesh
  • Dipto Shaha Department of Computer Science and Engineering, Faculty of Engineering and Technology University of Dhaka, Dhaka, Bangladesh
  • Md Rezaul Karim Department of Computer Science and Engineering, Faculty of Engineering and Technology University of Dhaka, Dhaka, Bangladesh

DOI:

https://doi.org/10.3329/dujs.v74i1.84122

Keywords:

Text-to-Speech (TTS), Speech Synthesis, Benchmarking, Objective Evaluation Metrics, Subjective Evaluation Metrics

Abstract

Bangla Text-to-Speech (TTS) systems have seen significant advancements in recent years, yet comprehensive benchmarking of their performance remains limited. This study establishes a robust evaluation framework to compare different Bangla TTS models, including Tacotron21, FastSpeech22, VITS3, and Grad-TTS4. The benchmarking approach integrates both objective and subjective assessment methodologies. Objective evaluation employs signal processing metrics such as Mel Cepstral Distortion (MCD), Mel-Spectrogram Mean Squared Error (Mel-MSE), Phoneme Error Rate (PER), Word Error Rate (WER), Signal-to-Noise Ratio (SNR), and Real-Time Factor (RTF). Subjective evaluation involves human perceptual tests such as Mean Opinion Score (MOS) test with native Bangla speakers rating speech quality and intelligibility. The study’s experimental setup ensures a fair comparison by utilizing a standardized dataset, uniform computational conditions, and diverse sentence structures. Results demonstrate the relative strengths and weaknesses of various models, highlighting the need for improved phonetic accuracy and naturalness in Bangla TTS synthesis. This research provides critical insights for advancing Bangla TTS systems and aligning them with state-of-the-art English TTS models.

Dhaka Univ. J. Sci. 74(1): 10-16, 2026 (January)

Downloads

Download data is not yet available.
Abstract
17
PDF
8

Author Biographies

Mehadi Hasan, Department of Computer Science and Engineering, Faculty of Engineering and Technology University of Dhaka, Dhaka, Bangladesh

 

 

Dipto Shaha, Department of Computer Science and Engineering, Faculty of Engineering and Technology University of Dhaka, Dhaka, Bangladesh

 

 

Downloads

Published

2026-01-28

How to Cite

Hasan, M., Shaha, D., & Karim, M. R. (2026). Bangla TTS Performance Evaluation: A Benchmark Study on Synthesized Speech Quality and Intelligibility: A Benchmark Study on Synthesized Speech Quality and Intelligibility. Dhaka University Journal of Science, 74(1), 10–16. https://doi.org/10.3329/dujs.v74i1.84122

Issue

Section

Articles