Both Serverless and Dedicated LLM APIs from CentML support structured outputs using the same OpenAI-compatible API. These features are crucial for agentic workloads that require reliable data parsing and function calling. The CentML platform also provides reasoning-enabled models (e.g., DeepSeek-AI/deepseek-r1
) that can perform reasoning before generating structured outputs.
JSON Schema Output
When you need a response strictly formatted as JSON, you can use JSON schema constraints. This is particularly useful in scenarios where your system or other downstream processes rely on valid JSON.
Below is an example of how to set up a request to the CentML LLM API using JSON schema enforcement.
How it Works
- Prompt Construction: Provide a system message telling the model to respond in JSON, along with the JSON schema itself.
- Schema Enforcement: In the
response_format
parameter, specify "type": "json_schema"
and include your JSON schema definition.
- Parsing the Output: The response is guaranteed to follow the schema (as strictly as the model can enforce it). You can then parse it directly in your application.
CentML’s LLM APIs support function calling similarly to OpenAI’s “function calling” feature. This allows you to define “tools” that the model can call with structured parameters. For example, you might have a get_weather
function your model can invoke based on user requests.
from openai import OpenAI
import os
import json
import time
api_key = os.environ.get("CENTML_API_KEY")
assert api_key is not None, "Please provide an API Key"
base_url = "https://api.centml.com/openai/v1"
def get_weather(location):
"""
Simulates getting weather data for a location.
In a real application, this would call a weather API.
"""
weather_data = {
"Paris, France": "temperature: 22",
"London, UK": "temperature: 18",
"New York, USA": "temperature: 25",
"Tokyo, Japan": "temperature: 27",
"Bogotá, Colombia": "temperature: 20"
}
return weather_data.get(location, "temperature: 20")
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": ["location"],
"additionalProperties": False
},
"strict": True
}
}]
def process_tool_calling(client, model_id, messages, available_functions=None):
"""
Process a tool calling request with proper error handling.
Args:
client: The OpenAI client
model_id: The ID of the model to use
messages: The messages to send to the model
available_functions: Dictionary mapping function names to their implementations
Returns:
str: response_text: The text response from the model
"""
if available_functions is None:
available_functions = {"get_weather": get_weather}
response_text = ""
try:
chat_completion = client.chat.completions.create(
model=model_id,
messages=messages,
max_tokens=4096,
tools=tools,
tool_choice="auto"
)
tool_calls = chat_completion.choices[0].message.tool_calls
if tool_calls:
assistant_message = {
"role": "assistant",
"content": chat_completion.choices[0].message.content or "",
"tool_calls": [
{
"id": tool_call.id,
"type": tool_call.type,
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments
}
} for tool_call in tool_calls
]
}
updated_messages = messages + [assistant_message]
for tool_call in tool_calls:
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
print(f"Function name: {function_name}, Function args: {function_args}")
function_response = function_to_call(location=function_args.get("location"))
print(f"Function response: {function_response}")
updated_messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response
})
second_response = client.chat.completions.create(
model=model_id,
messages=updated_messages
)
response_text = second_response.choices[0].message.content
print(f"Final response: {response_text}")
else:
print("No tool calls found")
response_text = chat_completion.choices[0].message.content
print(f"Direct response: {response_text}")
except Exception as e:
error_message = str(e)
if "tool choice requires --enable-auto-tool-choice" in error_message:
print("⚠️ ERROR: Tool calling is not supported by this model.")
print("The model requires server-side configuration for tool calling.")
print("Try using a different model or approach.")
else:
print(f"Error: {error_message}")
return response_text
system_message = {
"content": "You are a helpful assistant. You have access to ONLY get_weather function that provides temperature information for locations. ONLY use this function when explicitly asked about weather or temperature. For all other questions, respond directly without using any tools.",
"role": "system"
}
weather_question = {
"content": "What is the weather like in Paris today?",
"role": "user"
}
general_question = {
"content": "Who is the president of the United States?",
"role": "user"
}
client = OpenAI(api_key=api_key, base_url=base_url)
models = client.models.list()
models = [model for model in models if model.id == "meta-llama/Llama-3.3-70B-Instruct"]
for model in models:
print(f"\n--- Testing model: {model.id} ---")
print("\nTesting with weather question:")
weather_prompt = [system_message, weather_question]
start_time = time.time()
response = process_tool_calling(
client=client,
model_id=model.id,
messages=weather_prompt,
available_functions={"get_weather": get_weather}
)
elapsed_time = time.time() - start_time
print(f"Weather question time taken: {elapsed_time:.2f} seconds")
print("\nTesting with general question:")
general_prompt = [system_message, general_question]
start_time = time.time()
response = process_tool_calling(
client=client,
model_id=model.id,
messages=general_prompt,
available_functions={"get_weather": get_weather}
)
elapsed_time = time.time() - start_time
print(f"General question time taken: {elapsed_time:.2f} seconds")
How it Works
- Tool Definition: In the
tools
parameter, define a function with a name
, description
, and a JSON schema for parameters.
- Function Invocation: The model may decide to call the function (tool), returning the parameters it deems relevant based on user input.
- Your Application Logic: You receive the structured function arguments in
completion.choices[0].message.tool_calls
. You can then handle them programmatically (e.g., call an actual weather service).
Tip: Make sure your model (e.g., gpt-4o
or deepseek-ai/DeepSeek-R1
) is function-call capable. Not all model versions support function calling. Always check the CentML model documentation for compatibility.
Best Practices and Tips
- Schema Validation: The model will try to adhere to your schema, but always perform server-side validation before using the data (especially important in production).
- Temperature Setting: When generating structured data, lower the temperature to reduce the likelihood of extraneous or incorrect fields.
- Use Reasoning Models: For complex tasks requiring reasoning steps, consider using
deepseek-ai/DeepSeek-R1
. It provides chain-of-thought style reasoning capabilities.
- Production Hardening: If you’re building an agentic or workflow-driven application, ensure you handle potential parsing errors and fallback scenarios gracefully.
Both Serverless and Dedicated LLM APIs share the same interface and usage pattern. You only need to change the base URL (and any relevant credentials) if you switch between them.
Conclusion
Using JSON schema enforcement and function calling (tools) with CentML LLM APIs lets you build more reliable, agentic, and automated workflows. With minimal changes to your existing OpenAI-compatible code, you can take advantage of these features on the CentML platform—whether you’re deploying on Serverless or Dedicated.
For more details, visit our CentML Documentation or reach out on Slack if you have any questions!