Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJARR CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Current Issue
    • Issue in Progress
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN (USA): WJARAI || Impact Factor: 8.2 || ISSN Approved Journal

Designing resilient, low-latency data pipelines for streaming big data analytics using Apache Kafka and Spark ecosystems

Breadcrumb

  • Home
  • Designing resilient, low-latency data pipelines for streaming big data analytics using Apache Kafka and Spark ecosystems

Uju Ugonna Uzoagu *

Department of Computer Science, College of Computing and Software Engineering, Kennesaw State University, USA.

Review Article

World Journal of Advanced Research and Reviews, 2025, 27(03), 1856-1873

Article DOI: 10.30574/wjarr.2025.27.3.3369

DOI url: https://doi.org/10.30574/wjarr.2025.27.3.3369

Received on 21 August 2025; revised on 26 September 2025; accepted on 29 September 2025

The exponential growth of real-time data streams from digital platforms, Internet of Things (IoT) devices, and enterprise applications has redefined the requirements for big data analytics. Traditional batch-processing architectures, while robust for historical analysis, are increasingly insufficient in addressing the need for low-latency decision-making in sectors such as finance, healthcare, telecommunications, and e-commerce. Consequently, resilient streaming data pipelines have become critical in supporting fault-tolerant, scalable, and high-throughput analytics. This study explores the design and implementation of resilient, low-latency data pipelines for streaming big data analytics by leveraging the Apache Kafka and Apache Spark ecosystems. Kafka, a distributed publish-subscribe messaging system, provides durable, fault-tolerant ingestion capabilities with strong scalability properties, while Spark Structured Streaming delivers near real-time analytical processing and advanced machine learning integration. Together, these technologies form a complementary foundation for constructing streaming pipelines capable of handling large volumes of high-velocity data. The paper discusses architectural design principles, including partitioning strategies, replication for fault tolerance, stateful stream processing, and backpressure handling. It further evaluates techniques for ensuring end-to-end resilience, such as exactly-once semantics, checkpointing, and integration with containerized environments like Kubernetes for deployment scalability. Case study insights highlight latency benchmarks and system performance under varying workloads, demonstrating how the Kafka-Spark integration supports enterprise-grade analytics. By uniting resilience, scalability, and analytical depth, the proposed pipeline framework enables organizations to harness real-time insights while ensuring reliability under fluctuating conditions. The findings contribute practical guidelines for architects, engineers, and decision-makers seeking to operationalize streaming analytics infrastructures that meet the growing demands of modern data-driven enterprises.

Streaming data pipelines; Apache Kafka; Apache Spark; Big data analytics; Low-latency processing; Resilient architectures

https://journalwjarr.com/sites/default/files/fulltext_pdf/WJARR-2025-3369.pdf

Preview Article PDF

Uju Ugonna Uzoagu. Designing resilient, low-latency data pipelines for streaming big data analytics using Apache Kafka and Spark ecosystems. World Journal of Advanced Research and Reviews, 2025, 27(03), 1856-1873. Article DOI: https://doi.org/10.30574/wjarr.2025.27.3.3369.

Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0

Footer menu

  • Contact

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution