What is the NVIDIA H200 best for?

Long-context LLM inference; 70B–405B fine-tuning; RAG + agent serving at scale.

How much does the NVIDIA H200 cost per hour?

Pricing starts at $32.60/hr across 1 indexed providers on Servers.computer.

[HOPPER]

NVIDIA H200

Name: NVIDIA H200
Brand: NVIDIA

Hopper, refreshed — 76% more memory than H100.

The NVIDIA H200 is the memory-upgraded refresh of the Hopper architecture, with 141 GB HBM3e and 4.8 TB/s bandwidth. It's the practical sweet spot for production LLM inference and 70B-class fine-tuning where context length and KV cache dominate.

VRAM

141 GB HBM3e

FP8 TFLOPS

1,979

Mem BW

4.8 TB/s

TDP

700 W

Best for

Long-context LLM inference
70B–405B fine-tuning
RAG + agent serving at scale

Benchmarks

WORKLOAD	METRIC	VALUE
Llama-3 70B inference (FP8)	tokens/sec	~2,400
Mixtral 8x22B inference	tokens/sec	~1,900
FP8 dense matmul	TFLOPS	1,979

NVIDIA H200 availability from $32.60/hr

NVIDIA Hopper

8x NVIDIA H200

Memory1128GB HBM

RegionUS-East-1

Availability71%

ProviderAWS

$32.60/hr

Deploy →

Definition

What is Servers.Computer?

Servers.Computer is an AI compute routing and procurement layer that benchmarks, compares, and deploys GPU clusters (NVIDIA H100, H200, B200 and AMD MI300) across global cloud providers in real time.

Other GPUs

NVIDIA B200 NVIDIA H100 NVIDIA A100 NVIDIA L40S AMD Instinct MI300X