Turnkey deployment for Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) leverages external knowledge sources to enhance the accuracy and relevance of text generated by large language models. By incorporating real-world information, RAG empowers LLMs to produce more comprehensive and informative outputs.
Select the cluster and hardware instance where you want to deploy the embedding model.
Embedding models are typically small and can be efficiently run on lower-end GPUs like A10G and L4s, making this a more cost-effective hardware option.