By default, endpoints created for both LLM Serving and General Inference deployments are publicly accessible. If you prefer restricted access, you can make the endpoint private by selecting the Make it a private endpoint? option. This setting generates a TLS certificate upon deployment, which you can download. Access to the endpoint will then require the http client to use this certificate. Here are a few examples: Using curl command
curl -X 'POST' 'https://<endpoint_url>/openai/v1/chat/completions' \
  --cert <path to TLS certificate> \   # Downloaded certificate
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "messages": [
    {
      "role": "user",
      "content": "what is the meaning of life?"
    }
  ],
  "model": "meta-llama/Llama-3.2-3B-Instruct",
  "max_tokens": 512,
  "n": 1,
  "presence_penalty": 0,
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "temperature": 0.7,
  "top_p": 1
}'
Using httpx library
import httpx

client = httpx.Client(cert="<path to TLS certificate>")
Using OpenAPI client library
import httpx
from openai import OpenAI

client = OpenAI(
    api_key="no_key",
    base_url=”https://<endpoint url>/openai/v1”,
    http_client=httpx.Client(cert="<path to TLS certificate>"),
)

What’s Next

Managing Vault Objects

Learn how to generate and store CentML tokens and other vault objects.

Clients

Learn how to interact with the NVIDIA CCluster programmatically

Resources and Pricing

Learn more about CentML Pricing

Get Support

Submit a Support Request

Python SDK Reference

Learn how to access the NVIDIA CCluster using Python.

The Model Integration Lifecycle

Dive into how CentML can help optimize your Model Integration Lifecycle (MILC).