Skip to content

Benchmark Suites

vLLM contains two sets of benchmarks:

Performance Benchmarks

The performance benchmarks are used for development to confirm whether new changes improve performance under various workloads. They are triggered on every commit with both the perf-benchmarks and ready labels, and when a PR is merged into vLLM.

Manually Trigger the benchmark

Use vllm-ci-test-repo images with vLLM benchmark suite.
For CPU environment, please use the image with "-cpu" postfix.

Here is an example for docker run command for CPU.

docker run -it --entrypoint /bin/bash -v /data/huggingface:/root/.cache/huggingface  -e HF_TOKEN=''  --shm-size=16g --name vllm-cpu-ci  public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:1da94e673c257373280026f75ceb4effac80e892-cpu

Then, run below command inside the docker instance.

bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

When run, benchmark script generates results under benchmark/results folder, along with the benchmark_results.md and benchmark_results.json.

Runtime environment variables

  • ON_CPU: set the value to '1' on Intel® Xeon® Processors. Default value is 0.
  • SERVING_JSON: JSON file to use for the serving tests. Default value is empty string (use default file).
  • LATENCY_JSON: JSON file to use for the latency tests. Default value is empty string (use default file).
  • THROUGHPUT_JSON: JSON file to use for the throughout tests. Default value is empty string (use default file).
  • REMOTE_HOST: IP for the remote vLLM service to benchmark. Default value is empty string.
  • REMOTE_PORT: Port for the remote vLLM service to benchmark. Default value is empty string.

For more results visualization, check the visualizing the results.

The latest performance results are hosted on the public vLLM Performance Dashboard.

More information on the performance benchmarks and their parameters can be found in Benchmark README and performance benchmark description.

Nightly Benchmarks

These compare vLLM's performance against alternatives (tgi, trt-llm, and lmdeploy) when there are major updates of vLLM (e.g., bumping up to a new version). They are primarily intended for consumers to evaluate when to choose vLLM over other options and are triggered on every commit with both the perf-benchmarks and nightly-benchmarks labels.

The latest nightly benchmark results are shared in major release blog posts such as vLLM v0.6.0.

More information on the nightly benchmarks and their parameters can be found here.