1 Research Scholar, MTech Computer Science and Engineering, SIPNA College of Engineering and Technology, Amravati.
2 Professor, Computer Science and Engineering, SIPNA College of Engineering and Technology, Amravati.
World Journal of Advanced Research and Reviews, 2025, 28(03), 001-008
Article DOI: 10.30574/wjarr.2025.28.3.4010
Received 22 October 2025; revised on 29 November 2025; accepted on 01 December 2025
FusionNet is a parallel, hybrid deep-learning framework engineered for next-generation speech recognition and on-device speech-to-text processing. The system is implemented as an Android application (Java/XML) and integrated with Firebase Realtime Database to support secure, user-centric data management. Audio input undergoes a multi-stage preprocessing pipeline where MFCC, spectral, and temporal features are extracted and clustered using K-Means to group acoustically similar speech segments. These clustered representations are simultaneously processed through a dual-branch architecture: a Convolutional Neural Network (CNN) that learns spectral signatures and a Bidirectional Long Short-Term Memory (BiLSTM) network that models temporal dependencies. The fused embeddings are then classified using a Random Forest classifier, improving prediction stability in noisy or accent-variable conditions.
To enhance semantic clarity, an NLP engine supported by a generative AI model refines the raw transcriptions, corrects contextual errors, and extracts user intent. Real-time inference is achieved via TensorFlow Lite (TFLite), enabling low-latency, energy-efficient execution directly on mobile hardware without cloud dependency. FusionNet demonstrates robustness against ambient noise, speaker variability, and multilingual inputs, making it a practical and scalable solution for voice-driven applications. This hybrid architecture effectively combines clustering, parallel deep learning, classical ML classification, and generative AI reasoning to deliver an intelligent, high-accuracy speech recognition system tailored for real-world deployment.
Speech Recognition; Fusionnet; MFCC; CNN–Bilstm; Feature Clustering; K-Means; Random Forest; NLP; Generative AI; Speech-To-Text; On-Device AI; Tensorflow Lite; Mobile Deep Learning; Firebase Realtime Database; Multilingual Processing
Get Your e Certificate of Publication using below link
Preview Article PDF
Revati Harichandra Ramteke and Seema B. Rathod. FusionNet: A parallel deep learning model for speech recognition with feature clustering. World Journal of Advanced Research and Reviews, 2025, 28(03), 001-008. Article DOI: https://doi.org/10.30574/wjarr.2025.28.3.4010.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0