RAG Application
Turnkey deployment for Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) leverages external knowledge sources to enhance the accuracy and relevance of text generated by large language models. By incorporating real-world information, RAG empowers LLMs to produce more comprehensive and informative outputs.
1. Configure your RAG application
Name your RAG application and select the embedding and language models from the drop down list.
Click next.
2. Select the cluster and the hardware to deploy
Select the cluster and hardware instance where you want to deploy the embedding model.
Embedding models are typically small and can be efficiently run on lower-end GPUs like A10G and L4s, making this a more cost-effective hardware option.
Click deploy.
3. Use the RAG deployment
Once the deployment is ready, go to the Playground tab under deployment details page.
Upload PDF, Markdown, or Text files to the RAG and ask questions about the information provided.
What’s Next
LLM Serving
Explore dedicated public and private endpoints for production model deployments.
Clients
Learn how to interact with the CentML platform programmatically
Resources and Pricing
Learn more about the CentML platform’s pricing.
Private Inference Endpoints
Learn how to create private inference endpoints
Submit a Support Request
Submit a Support Request.
Agents on CentML
Learn how agents can interact with CentML services.