Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJARR CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Current Issue
    • Issue in Progress
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN (USA): WJARAI || Impact Factor: 8.2 || ISSN Approved Journal

Optimizing GPU Utilization for AI Workloads on AWS EKS

Breadcrumb

  • Home
  • Optimizing GPU Utilization for AI Workloads on AWS EKS

Praneel Madabushini *

NVIDIA Corporation, USA.

Review Article

World Journal of Advanced Research and Reviews, 2025, 26(01), 1955-1963

Article DOI: 10.30574/wjarr.2025.26.1.1233

DOI url: https://doi.org/10.30574/wjarr.2025.26.1.1233

Received on 25 February 2025; revised on 12 April 2025; accepted on 14 April 2025

This article explores comprehensive strategies for optimizing GPU utilization for artificial intelligence workloads on Amazon Elastic Kubernetes Service (EKS). As organizations increasingly deploy computationally intensive AI applications, effective GPU resource management has become critical for balancing performance requirements with cost considerations. The article examines four key optimization domains: GPU instance selection and scheduling strategies, cost optimization and resource allocation techniques, performance enhancement using NVIDIA-specific tools, and model-level optimization methods. Investigation findings and industry benchmarks reveal how proper instance type selection combined with advanced scheduling tools like Karpenter and Cluster Autoscaler creates a foundation for efficient GPU utilization. The article further explores how spot instances, precise resource allocation, and comprehensive monitoring solutions can substantially reduce infrastructure costs. Additionally, it highlights the performance advantages of specialized NVIDIA tools such as TensorRT and Triton Inference Server and examines how model-specific techniques, including mixed precision training, gradient accumulation, knowledge distillation, quantization, and pruning can maximize computational efficiency while preserving model accuracy.

GPU optimization; AWS EKS; Machine Learning Infrastructure; Inference Acceleration; Resource Allocation

https://journalwjarr.com/sites/default/files/fulltext_pdf/WJARR-2025-1233.pdf

Preview Article PDF

Praneel Madabushini. Optimizing GPU Utilization for AI Workloads on AWS EKS.  World Journal of Advanced Research and Reviews, 2025, 26(01), 1955-1963. Article DOI: https://doi.org/10.30574/wjarr.2025.26.1.1233.

Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0

Footer menu

  • Contact

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution