What is the NVIDIA L40S best for?

7B–13B LLM inference; Stable Diffusion / image gen; Agent fleets.

How much does the NVIDIA L40S cost per hour?

Pricing starts at $5.20/hr across 1 indexed providers on Servers.computer.

[ADA LOVELACE]

NVIDIA L40S

Name: NVIDIA L40S
Brand: NVIDIA

Inference-optimized — the best $/token for 7B–13B serving.

The NVIDIA L40S (Ada Lovelace) is the highest-efficiency inference GPU for small-to-mid models. Excellent for image generation, 7B/13B LLM serving, and agent fleets where cost-per-token dominates.

VRAM

48 GB GDDR6

FP8 TFLOPS

1,466

Mem BW

864 GB/s

TDP

350 W

Best for

7B–13B LLM inference
Stable Diffusion / image gen
Agent fleets

Benchmarks

WORKLOAD	METRIC	VALUE
Llama-3 8B inference	tokens/sec	~2,100
SDXL 1024×1024	img/sec	~1.4

NVIDIA L40S availability from $5.20/hr

NVIDIA Ada

4x NVIDIA L40S

Memory192GB HBM

RegionUS-West-2

Availability97%

ProviderLambda Labs

$5.20/hr

Deploy →

Definition

What is Servers.Computer?

Servers.Computer is an AI compute routing and procurement layer that benchmarks, compares, and deploys GPU clusters (NVIDIA H100, H200, B200 and AMD MI300) across global cloud providers in real time.

Other GPUs

NVIDIA B200 NVIDIA H200 NVIDIA H100 NVIDIA A100 AMD Instinct MI300X