CentML provides dedicated LLM endpoints, general inference, and compute deployments — all backed by optimized infrastructure. No need to worry about GPU provisioning, scaling, or maintenance — just log in and deploy.

1. Log into the NVIDIA CCluster

To get started, navigate to https://app.centml.com in your browser and log in with your credentials.
If you need guidance on how to create an account, please visit the Creating an Account documentation. Once logged in, you will see the NVIDIA CCluster console home page, as shown below.

2. Create a Bearer Token

To interact with CentML endpoints programmatically, you need a Bearer Token. Follow the Managing Vault Objects documentation to generate one.

3. Deploy Your First Model

Choose the deployment type that fits your use case:
  • LLM Serving — Deploy dedicated public or private LLM endpoints tailored to your performance requirements and budget.
  • General Inference — Deploy custom containerized models on CentML infrastructure.
  • Compute — Provision GPU compute for training, fine-tuning, or batch workloads.

Additional Support: Billing, Sales, and/or Technical

For billing or sales support reach out to sales@centml.ai. You can also fill out a support request by following our Requesting Support guide. Support requests are not limited to sales and billing. They can include technical support and more. Please do not hesitate to reach out! We’re here to help!

What’s Next

Agents on CentML

Learn how agents can interact with CentML services.

Clients

Learn how to interact with the NVIDIA CCluster programmatically

Resources and Pricing

Learn more about CentML Pricing

Get Support

Submit a Support Request

LLM Serving

Explore dedicated public and private endpoints for production model deployments.

Deploying Custom Models

Learn how to build your own containerized inference engines and deploy them on the NVIDIA CCluster.