We make it easy for you to containerize and deploy any custom models on our platform. In addition, we also provide popular inference engines like vLLM and Ollama.

1. Configure your inference deployment

Select or enter a container inference engine image, tag and container port.

To make the endpoint private, select the “Make it a private endpoint?” option. This will automatically generate a TLS certificate, which you can download as a .pem file after the deployment is complete. Only those with this TLS certificate will have access to the endpoint. For more details, please refer to this resource.

For further customization, you can configure the optional fields as well.

healthcheck: HTTP health check endpoint to check if the inference engine is ready to accept requests. Default is None.
command and arguments: Entrypoint command and arguments to run when the container starts. Default is entrypoints specified in the container image.
autoscaling: Set the min and max scale for your deployment. We scale up and down your deployment to match the traffic based on the max concurrency set. Max concurrency refers to the maximum number of in-flight requests per replica. Default is infinity.
environment variables: Pass in any additional environment variables to the container. e.g, HF_TOKEN

For building your own custom container image and deploying on CentML Platform, please refer to this resource.

2. Select the cluster and hardware to deploy

By default, CentML Platform provides several managed clusters and GPU instances for you to deploy your inference containers.

Select the regional cluster and hardware instance that best fits your need and click Deploy.

You can integrate your own private cluster into the Platform through our bring-your-own-infrastructure support. To get started, please get in touch with us at support@centml.ai.

3. Monitor your deployment

Once deployed, you can see all your deployments under the listing view along with their current status.

Click on the deployment to view the details page, logs and monitoring information.

Once the deployment status is ready, the container port is going to be exposed under the endpoint url shown in the details page and can be accessed through https://<endpoint_url>.

What’s Next

LLM Serving

Explore dedicated public and private endpoints for production model deployments.

Clients

Learn how to interact with the CentML platform programmatically

Resources and Pricing

Learn more about the CentML platform’s pricing.

Private Inference Endpoints

Learn how to create private inference endpoints

Submit a Support Request

Submit a Support Request.

Agents on CentML

Learn how agents can interact with CentML services.

Deployments

Clients

Resources

Examples

General Inference

1. Configure your inference deployment

2. Select the cluster and hardware to deploy

3. Monitor your deployment

What’s Next

LLM Serving

Clients

Resources and Pricing

Private Inference Endpoints

Submit a Support Request

Agents on CentML

Deployments

Clients

Resources

Examples

​1. Configure your inference deployment

​2. Select the cluster and hardware to deploy

​3. Monitor your deployment

​What’s Next

LLM Serving

Clients

Resources and Pricing

Private Inference Endpoints

Submit a Support Request

Agents on CentML

1. Configure your inference deployment

2. Select the cluster and hardware to deploy

3. Monitor your deployment

What’s Next