Retrieval-Augmented Generation (RAG) leverages external knowledge sources to enhance the accuracy and relevance of text generated by large language models. By incorporating real-world information, RAG empowers LLMs to produce more comprehensive and informative outputs.

1. Configure your RAG application

Name your RAG application and select the embedding and language models from the drop down list.

Click next.

2. Select the cluster and the hardware to deploy

Select the cluster and hardware instance where you want to deploy the embedding model.

Embedding models are typically small and can be efficiently run on lower-end GPUs like A10G and L4s, making this a more cost-effective hardware option.

Click deploy.

3. Use the RAG deployment

Once the deployment is ready, go to the Playground tab under deployment details page.

Upload PDF, Markdown, or Text files to the RAG and ask questions about the information provided.

What’s Next

LLM Serving

Explore dedicated public and private endpoints for production model deployments.

Clients

Learn how to interact with the CentML platform programmatically

Resources and Pricing

Learn more about the CentML platform’s pricing.

Private Inference Endpoints

Learn how to create private inference endpoints

Submit a Support Request

Submit a Support Request.

Agents on CentML

Learn how agents can interact with CentML services.

Deployments

Clients

Resources

Examples

RAG Application

1. Configure your RAG application

2. Select the cluster and the hardware to deploy

3. Use the RAG deployment

What’s Next

LLM Serving

Clients

Resources and Pricing

Private Inference Endpoints

Submit a Support Request

Agents on CentML

Deployments

Clients

Resources

Examples

​1. Configure your RAG application

​2. Select the cluster and the hardware to deploy

​3. Use the RAG deployment

​What’s Next

LLM Serving

Clients

Resources and Pricing

Private Inference Endpoints

Submit a Support Request

Agents on CentML

1. Configure your RAG application

2. Select the cluster and the hardware to deploy

3. Use the RAG deployment

What’s Next