The NVIDIA CCluster simplifies deploying your custom models and AI applications. Package your model or application as a containerized HTTP server, then deploy it on CentML using the General Inference deployment option. Here’s an example of how to build a stable diffusion model served by a FastAPI-based inference server: https://github.com/CentML/codex/tree/main/general-apps/general-inference/stable-diffusion/centml-endpoint
The NVIDIA CCluster enforces Pod Security Admission (PSA) baseline controls at the namespace level. Containers can run as any user, including root.
If you are building the container on MacOS, make sure to set the image platform as linux/amd64.
docker build --platform linux/amd64 -t stable-diffusion:latest .
After building the container, push it to a container registry. The NVIDIA CCluster supports both public registries (e.g., Docker Hub) and private registries (e.g., Amazon ECR, Google Artifact Registry, Azure Container Registry, or self-hosted registries). For public images, navigate to the General Inference option in the NVIDIA CCluster and enter the image name, tag, and container port to deploy your application. For private images, you will need to provide registry credentials so CCluster can pull the image. You can either store credentials in your Vault as a Registry Credentials item and select them during deployment, or enter the registry username and password directly when configuring the deployment. CCluster automatically detects the registry from the image URL.

What’s Next