Completions
The Avian API's base URL is https://api.avian.io
Last updated
The Avian API's base URL is https://api.avian.io
Last updated
The prompt to generate completions for.
For base models, this may be a free-form string, such as one or more paragraphs of text from a novel (in which case the model may generate a continuation of the text).
For chat or instruct models, the messages should ideally be formatted according to the model's preferred format. For the llama-3.1-instruct
family of models, the prompt format is specified in the model card (link).
Note that if you use our chat endpoint (link), we will automatically format the messages for you.
How many tokens the model is allowed to generate before being stopped.
Note that this only controls how many tokens can be generated (before the response is cut off), not how many will be generated. Setting this to a high value will not make the replies more verbose, and conversely, setting it to a low value will not make the replies more concise.
The maximum value is 128K tokens (128 * 1024 = 131072
)
Temperature controls how much the model is allowed to deviate from standard behavior.
Lower values will make the model more conservative in its responses, and values like 0
will make the model deterministic (i.e. it will always generate the same output for the same input).
Higher values, like 0.8
will make the model more creative in its responses, meaning it will take more risks and generate more unexpected outputs.
Values above 1
are not recommended, as they can lead to nonsensical outputs, but we allow them for experimentation.
Whether to stream the response or not.
An alternative to sampling with temperature
, a top_p value of 0.1
means the model will be forced to choose from the tokens that make up the top 10% of the probability distribution, making the output more conservative and less creative.
Low values of top_p
have a similar effect to low values of temperature
(i.e. they make the model more conservative).
While we allow you to set both temperature
and top_p
at the same time, we recommend using only one of them at a time, as they can interfere with each other - this is very likely not what you want to do.
Seed is an arbitrary integer number that allows you to make the model's output deterministic. Using the same seed will make the model generate the same output for the same input - though note that this is not always guaranteed, as different versions of the model/hardware/libraries may generate slightly different outputs.
If you do not provide a seed, we will generate one for you, making the output non-deterministic.
Note that lower temperature
values will tend to make the output more similar, regardless of the seed.
Use stop to control when the model should stop generating text.
You may pass up to 4 sequences, and the model will stop generating text when it encounters any of them.
We default to ["<|eot_id|>", "<|eom_id|>", "<|end_of_text|>"]
, which are tokens that the model recognizes as generation boundaries, but you may turn off this behavior by passing an empty array `[], or provide your own sequences.
Returns the chat completion response.
Note that the response can be either static or streaming.
You can see examples of both types of responses by selecting Static completion response
or Streaming completion response
from the dropdown menu, which is located below the example box.