[ADA LOVELACE]

NVIDIA L40S

Inference-optimized — the best $/token for 7B–13B serving.

The NVIDIA L40S (Ada Lovelace) is the highest-efficiency inference GPU for small-to-mid models. Excellent for image generation, 7B/13B LLM serving, and agent fleets where cost-per-token dominates.

VRAM
48 GB GDDR6
FP8 TFLOPS
1,466
Mem BW
864 GB/s
TDP
350 W

Best for

  • 7B–13B LLM inference
  • Stable Diffusion / image gen
  • Agent fleets

Benchmarks

WORKLOADMETRICVALUE
Llama-3 8B inferencetokens/sec~2,100
SDXL 1024×1024img/sec~1.4

NVIDIA L40S availability from $5.20/hr

NVIDIA Ada

4x NVIDIA L40S

Memory192GB HBM
RegionUS-West-2
Availability97%
ProviderLambda Labs
$5.20/hr
Deploy →
Definition

What is Servers.Computer?

Servers.Computer is an AI compute routing and procurement layer that benchmarks, compares, and deploys GPU clusters (NVIDIA H100, H200, B200 and AMD MI300) across global cloud providers in real time.