Choosing the wrong GPU for your AI/ML workload can waste thousands of dollars per month. This comprehensive guide breaks down when to use A100, H100, L4, T4, and other GPUs, with real pricing and ROI calculations.
GPU Pricing Comparison (AWS, GCP, Azure Average)
GPU Model | Memory | On-Demand ($/hr) | Spot ($/hr) | Best For |
---|---|---|---|---|
H100 | 80GB | $8.20 | $0.82 | Large model training, LLM fine-tuning |
A100 | 80GB | $4.10 | $1.23 | Training, large batch inference |
A100 | 40GB | $3.06 | $0.92 | Medium-size model training |
L4 | 24GB | $0.94 | $0.28 | Inference, video AI, fine-tuning |
T4 | 16GB | $0.53 | $0.16 | Light inference, development |
V100 | 16GB | $2.48 | $0.74 | Legacy training workloads |
The Decision Tree: Which GPU Should You Use?
For Training Large Models (> 20B parameters)
- H100 - Fastest training, 3x faster than A100
- A100 80GB - Budget option, excellent performance
- Use spot instances with checkpointing for 70-90% discount
For Training Medium Models (1B-20B parameters)
- A100 40GB - Best price/performance
- L4 x2 - If distributed training is possible
- A100 80GB - If model barely fits in 40GB
For Inference
- L4 - Best for most inference workloads (77% cheaper than A100)
- T4 - For lightweight models, mobile deployment
- A100 - Only for large batch inference or real-time large models
For Fine-Tuning
- L4 - Perfect for LoRA, QLoRA fine-tuning
- A100 40GB - Full fine-tuning of 7B-13B models
- H100 - Full fine-tuning of 70B+ models
For Development/Testing
- T4 - 85% cheaper than A100
- L4 - If testing production-like workloads
- Auto-shutdown after hours - Save 66% by shutting down nights/weekends
Real-World Use Cases & Savings
🎯 Use Case 1: Computer Vision Inference
Before: Running 10 A100 instances for real-time object detection
Cost: $4.10/hr × 10 × 730 hours = $29,930/month
After: Switched to L4 instances (same throughput)
New Cost: $0.94/hr × 10 × 730 hours = $6,862/month
Annual Savings: $277,000 77% reduction
🎯 Use Case 2: LLM Fine-Tuning (7B Model)
Before: A100 80GB on-demand for LoRA fine-tuning
Cost: $4.10/hr × 8 hours = $32.80 per experiment
After: L4 spot instances with same performance
New Cost: $0.28/hr × 10 hours = $2.80 per experiment
Savings: $30 per run 91% reduction
🎯 Use Case 3: Training Large Language Model (70B)
Before: 8x A100 80GB on-demand for 2 weeks
Cost: $4.10/hr × 8 × 336 hours = $11,020.80
After: 8x H100 spot instances with checkpointing
Training Time: 4.5 days instead of 14 days (3x faster)
New Cost: $0.82/hr × 8 × 108 hours = $708.48
Savings: $10,312 per training run 94% reduction
Common Mistakes & How to Avoid Them
❌ Mistake #1: Using A100 for Everything
Impact: 77% overspend on inference workloads
Solution: Profile your workload. If GPU utilization is < 40%, downgrade to L4 or T4.
❌ Mistake #2: Ignoring Spot Instances
Impact: Paying 5-10x more for training
Solution: Implement checkpointing every 15-30 minutes. Use spot for all training jobs.
❌ Mistake #3: Running Dev Environments 24/7
Impact: Wasting 66% of dev budget on idle resources
Solution: Auto-shutdown dev GPUs at 6pm, restart at 8am. Schedule-based autoscaling.
❌ Mistake #4: Not Using Multi-GPU for Training
Impact: 3x longer training times, higher costs
Solution: Use distributed training. 4x L4 ($3.76/hr) can match 1x A100 ($4.10/hr) with better fault tolerance.
GPU Selection Cheat Sheet
Quick Reference Guide
🚀 For Speed (Training Large Models):
- H100 spot instances - Fastest, 90% discount
- Multi-GPU A100 - If H100 unavailable
💰 For Cost (Budget-Conscious):
- L4 for inference - 77% cheaper than A100
- T4 for dev/test - 85% cheaper than A100
- Always use spot for training - 70-90% discount
⚖️ For Balance (Production Workloads):
- L4 for most inference (excellent price/performance)
- A100 40GB for training mid-size models
- Mix of on-demand and spot for reliability
Monitoring & Optimization
Choosing the right GPU is step one. Continuous monitoring ensures you're not overpaying:
Key Metrics to Track:
- GPU Utilization: If < 40%, downgrade to cheaper GPU
- Memory Usage: If using < 50%, switch to smaller GPU
- Training Time: Compare cost vs time tradeoff
- Inference Latency: Ensure L4/T4 meet SLA requirements
- Spot Interruption Rate: Track reliability of spot instances
Implementation Roadmap
Week 1: Audit Current Usage
- List all GPU instances and their usage patterns
- Identify training vs inference workloads
- Calculate current monthly spend
Week 2: Quick Wins
- Switch all inference from A100 to L4 (77% savings)
- Move dev/test to T4 (85% savings)
- Enable auto-shutdown for dev environments (66% savings)
Week 3: Advanced Optimization
- Enable spot instances for training with checkpointing
- Implement distributed training on cheaper GPUs
- Set up cost alerts and budgets
Week 4: Monitoring & Refinement
- Track GPU utilization and right-sizing opportunities
- Fine-tune autoscaling policies
- Document cost savings and present to leadership
Conclusion: The $200K+ Savings Opportunity
For a typical AI company spending $50K/month on GPUs, implementing these optimizations can reduce costs to $20-25K/month - an annual savings of $300-360K.
The key is matching workload requirements to GPU capabilities. Not every job needs an A100, and spot instances can provide 90% discounts with minimal risk.
Automate Your GPU Cost Optimization
OpenFinOps provides automatic GPU right-sizing recommendations, spot instance management, and real-time cost tracking.
Start Optimizing Free →About the Author: This guide is maintained by the OpenFinOps team, who help organizations optimize AI/ML infrastructure costs. OpenFinOps is open source and free to use. Visit openfinops.org to learn more.