Retrieval-Augmented Generation (RAG) leverages external knowledge sources to enhance the accuracy and relevance of text generated by large language models. By incorporating real-world information, RAG empowers LLMs to produce more comprehensive and informative outputs.

1. Configure your RAG application

Name your RAG application and select the embedding and language models from the drop down list.

Click next.

2. Select the cluster and the hardware to deploy

Select the cluster and hardware instance where you want to deploy the embedding model.

Embedding models are typically small and can be efficiently run on lower-end GPUs like A10G and L4s, making this a more cost-effective hardware option.

Click deploy.

3. Use the RAG deployment

Once the deployment is ready, go to the Playground tab under deployment details page.

Upload PDF, Markdown, or Text files to the RAG and ask questions about the information provided.