Venation‑aware hybrid CNN‑transformer for fine‑grained leaf species identification

Authors

  • Md Riaz Hasan School of Computer Science and Engineering, Southeast University, Nanjing, China
  • Fariha Sultana College of Computer Science and Software Engineering, Hohai University, Nanjing, China
  • Mohammad Ashraful Alam Ecology, Environment and Natural Resource Laboratory, Department of Botany, University of Dhaka, Dhaka-1000, Bangladesh

DOI:

https://doi.org/10.3329/bjb.v54i4.86616

Keywords:

Leaf classification, Fine-grained recognition, Hybrid CNN-Transformer, Venation-aware model, Image-based classification

Abstract

The identification of plant species from leaf images is a foundational task for botany, agriculture, and biodiversity monitoring. Traditional approaches, which are based on handcrafted features or convolutional neural networks (CNNs), focus on local texture or edge patterns but often overlook global morphological context, such as venation topology and overall shape. Vision transformers (ViTs), on the other hand, capture long-range dependencies but lack the inductive bias necessary to attend to fine-grained venation structures. In this study, a venation-aware hybrid CNN-Transformer architecture is proposed for the fine-grained classification of five common leaf species i.e., banana, guava, jackfruit, mango, and neem, using a high-quality dataset of 2,500 images. Each species contributes 500 labeled photographs, which are organized into separate directories. The images were captured under varied lighting, backgrounds, and viewpoints, making the task non-trivial. Morphological priors are introduced through edge and vein extraction, and local CNN features are fused with global ViT tokens via cross‑attention and a venation consistency objective. Extensive experiments are conducted, including ablation studies, baseline comparisons, calibration analysis, robustness to color shifts, and qualitative interpretability through Grad-CAM and attention rollout. The proposed hybrid model is found to achieve a test macro‑F1 of 0.9973 and balanced accuracy of 0.9973, significantly outperforming strong CNN and ViT baselines. Reliability diagrams indicate low miscalibration, and robustness tests show that the venation priors improve performance under background variation. All code, trained models, and experimental logs are released to facilitate reproducibility.

Bangladesh J. Bot. 54(4): 891-900, 2025 (December)

Downloads

Download data is not yet available.
Abstract
36
PDF
26

Downloads

Published

2025-12-30

How to Cite

Hasan, M. R., Sultana, F., & Alam , M. A. (2025). Venation‑aware hybrid CNN‑transformer for fine‑grained leaf species identification. Bangladesh Journal of Botany, 54(4), 891–900. https://doi.org/10.3329/bjb.v54i4.86616

Issue

Section

Articles