Skip to main content

Securing Endpoints

NVIDIA CCluster does not expose public endpoints. Every LLM Serving and General Inference deployment must be protected, and clients must authenticate on each request. You secure a deployment with at least one of the following:

  • Bearer token — an Authorization: Bearer <token> header on each request.
  • mTLS client certificate — a TLS certificate the client presents on each request.

You choose the protection method when configuring a deployment. You may enable both for defense in depth.

Bearer token access

Generate a Bearer token from your Vault, then include it in the Authorization header of every request.

Using curl command

curl -X 'POST' 'https://<endpoint_url>/openai/v1/chat/completions' \
-H 'Authorization: Bearer <your_bearer_token>' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": "what is the meaning of life?"
}
],
"model": "meta-llama/Llama-3.2-3B-Instruct",
"max_tokens": 512
}'

Using OpenAI client library

from openai import OpenAI

client = OpenAI(
api_key="<your_bearer_token>",
base_url="https://<endpoint url>/openai/v1",
)

mTLS certificate access

Select the Make it a private endpoint? option when configuring a deployment. This generates a TLS certificate upon deployment, which you can download. Access to the endpoint will then require the http client to use this certificate. Here are a few examples:

Using curl command

curl -X 'POST' 'https://<endpoint_url>/openai/v1/chat/completions' \
--cert <path to TLS certificate> \ # Downloaded certificate
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": "what is the meaning of life?"
}
],
"model": "meta-llama/Llama-3.2-3B-Instruct",
"max_tokens": 512,
"n": 1,
"presence_penalty": 0,
"stream": true,
"stream_options": {
"include_usage": true
},
"temperature": 0.7,
"top_p": 1
}'

Using httpx library

import httpx

client = httpx.Client(cert="<path to TLS certificate>")

Using OpenAPI client library

import httpx
from openai import OpenAI

client = OpenAI(
api_key="no_key",
base_url="https://<endpoint url>/openai/v1",
http_client=httpx.Client(cert="<path to TLS certificate>"),
)

What's next