How to Get the Most out of Your Serverless Endpoints
Serverless
option from the sidebar menu.
Your Serverless Endpoints
screen, allows you to customize the following endpoint settings:
API
under the View
section on the right-hand side of the screen to reveal the relevant code snippets.
Python
, JavaScript
, or Curl
options to view examples in those respective languages.
Note that the editor is read-only.
To execute the code snippets, you’ll need to run the commands from a separate terminal on a local or remote machine.
The following sections provide example prompts in all supported languages.
curl
commands, copy the platform provided code into your terminal (with cURL installed) and execute it. Note that you must add your API token to the curl
command
Example Curl
Command
pip3 install openai
from their desired terminal or Python environment.
Once installed, save the Python script from the UI into a file (we named ours centml-serverless.py
) and run it using python3 <your file name>
(i.e. python3 centml-serverless.py
). Note that you will have to ensure you’ve added your API token to the script.
Example Python Script
OpenAI
library. You can do so by running npm install openai
or yarn add openai
in your terminal.
Once the OpenAI library has been installed, save the platform provided code snippet in a file ending with .js
. We named ours index.js
.
Once you are ready to run the code, run the command node <your file>.js
(i.e. node index.js
). Note that you will have to ensure you’ve added your API token to the script.
You should see an LLM response!
Example JavaScript Code
Invalid API Key
Invalid API key
error, ensure you are using your Serverless API key and not a different key from your vault. It may also make sense to ensure you have the proper syntax. Such as BaseUrl
vs. BaseURL
.
API
option on the right side of the Serverless UI.
This is the same location where example API call snippets are provided.
CentML does not currently provide a way for you to view or extract your chat histories.
{"error":"Concurrent request limit reached, please try again later"}%
like message. When using the Chat UI or APIs directly, you are restricted to a limited number of requests based on demand.
CentML Serverless endpoints are multi-user and not a dedicated deployment of a specific LLM nor a production chat application.
CentML serverless endpoints are designed to help you quickly test new models and collect some base performance metrics before moving to a dedicated endpoint such as LLM Serving.
You may choose to use OpenRouter for a higher level of concurrency and guaranteed performance should you not want a dedicated endpoint.
+ Request a model
from the configuration menu on the right side of the screen and filling out the form.
Once submitted, the CentML team will review the submitted request and respond in a timely manner.
Please do not submit multiple tickets with the same model request. That will not expedite the process. Should you need escalation, feel free to reach out to the sales teams you’ve been engaged with or contact sales@centml.ai
.
Request a Model
Description:
We are reaching out to request the addition of a new Large Language Model (LLM) to your serverless API. Our team is currently evaluating various LLMs for our natural language processing (NLP) use cases, and we believe that integrating this new model will enhance our capabilities. Business Use Case: Our primary use case is to leverage the new LLM for text classification, sentiment analysis, and language translation tasks. We anticipate that this model will provide more accurate results compared to our current models, enabling us to improve our customer experience and gain a competitive edge. Specifically, we plan to use the new model to:Limitations: While the current hosted LLMs show promise, our testing has revealed limitations that impact their suitability for our use cases. Specifically:
- Analyze customer feedback and sentiment on our platform.
- Classify and route customer inquiries to the relevant support teams.
- Translate user-generated content to facilitate global communication.
We’re concerned that these limitations may compromise the accuracy and reliability of our customer-facing applications. We’re looking for alternative models or customizations that can address these issues. Concurrency Considerations: We understand that serverless APIs are designed to handle variable workloads, and we expect our usage to be moderate. Initially, we anticipate an average of 10 requests per minute, with occasional spikes to 50 requests per minute during peak hours. We believe that the serverless API can handle this concurrency level without significant performance degradation. Future Plans: After evaluating the performance of the new model, we anticipate that our usage will grow, and we may eventually require a dedicated endpoint to handle our workload. We are working towards assessing the model’s performance and scalability, and we expect to migrate to a dedicated endpoint if our request volume exceeds 500 requests per minute. We would like to request guidance on the process for migrating to a dedicated endpoint, should it become necessary.
- Lack of domain-specific fine-tuning affects text classification accuracy.
- Sentiment analysis is biased towards certain emotional tones or language styles.
- Language translation struggles with domain-specific content, idioms, and colloquialisms.
Title | OpenAI-Compatible Endpoint |
---|---|
Chat Completions | Post /v1/chat/completions |
Completions | Post /v1/completions |
https://api.centml.com/openai/v1/
as shown in the examples.