Securing Endpoints
NVIDIA CCluster does not expose public endpoints. Every LLM Serving and General Inference deployment must be protected, and clients must authenticate on each request. You secure a deployment with at least one of the following:
- Bearer token — an
Authorization: Bearer <token>header on each request. - mTLS client certificate — a TLS certificate the client presents on each request.
You choose the protection method when configuring a deployment. You may enable both for defense in depth.
Bearer token access
Generate a Bearer token from your Vault, then include it in the Authorization header of every request.
Using curl command
curl -X 'POST' 'https://<endpoint_url>/openai/v1/chat/completions' \
-H 'Authorization: Bearer <your_bearer_token>' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": "what is the meaning of life?"
}
],
"model": "meta-llama/Llama-3.2-3B-Instruct",
"max_tokens": 512
}'
Using OpenAI client library
from openai import OpenAI
client = OpenAI(
api_key="<your_bearer_token>",
base_url="https://<endpoint url>/openai/v1",
)
mTLS certificate access
Select the Make it a private endpoint? option when configuring a deployment. This generates a TLS certificate upon deployment, which you can download. Access to the endpoint will then require the http client to use this certificate. Here are a few examples:
Using curl command
curl -X 'POST' 'https://<endpoint_url>/openai/v1/chat/completions' \
--cert <path to TLS certificate> \ # Downloaded certificate
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": "what is the meaning of life?"
}
],
"model": "meta-llama/Llama-3.2-3B-Instruct",
"max_tokens": 512,
"n": 1,
"presence_penalty": 0,
"stream": true,
"stream_options": {
"include_usage": true
},
"temperature": 0.7,
"top_p": 1
}'
Using httpx library
import httpx
client = httpx.Client(cert="<path to TLS certificate>")
Using OpenAPI client library
import httpx
from openai import OpenAI
client = OpenAI(
api_key="no_key",
base_url="https://<endpoint url>/openai/v1",
http_client=httpx.Client(cert="<path to TLS certificate>"),
)