By default, endpoints created for both LLM Serving and General Inference deployments are publicly accessible. If you prefer restricted access, you can make the endpoint private by selecting the Make it a private endpoint? option.
This setting generates a TLS certificate upon deployment, which you can download. Access to the endpoint will then require the http client to use this certificate. Here are a few examples:
Using curl command
curl -X 'POST' 'https://<endpoint_url>/openai/v1/chat/completions' \
--cert <path to TLS certificate> \ # Downloaded certificate
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": "what is the meaning of life?"
}
],
"model": "meta-llama/Llama-3.2-3B-Instruct",
"max_tokens": 512,
"n": 1,
"presence_penalty": 0,
"stream": true,
"stream_options": {
"include_usage": true
},
"temperature": 0.7,
"top_p": 1
}'
Using httpx library
import httpx
client = httpx.Client(cert="<path to TLS certificate>")
Using OpenAPI client library
import httpx
from openai import OpenAI
client = OpenAI(
api_key="no_key",
base_url=”https://<endpoint url>/openai/v1”,
http_client=httpx.Client(cert="<path to TLS certificate>"),
)