Completions
Last updated
Last updated
The Avian API's base URL is https://api.avian.io
/v1/completions
Bearer avian-XX_XXXXXXXXXXXXXXXXX
The prompt to generate completions for.
For base models, this may be a free-form string, such as one or more paragraphs of text from a novel (in which case the model may generate a continuation of the text).
For chat or instruct models, the messages should ideally be formatted according to the model's preferred format. For the llama-3.1-instruct
family of models, the prompt format is specified in the model card (link).
Note that if you use our chat endpoint (link), we will automatically format the messages for you.
The number of completions to generate
How many tokens the model is allowed to generate before being stopped.
Note that this only controls how many tokens can be generated (before the response is cut off), not how many will be generated. Setting this to a high value will not make the replies more verbose, and conversely, setting it to a low value will not make the replies more concise.
The maximum value is 128K tokens (128 * 1024 = 131072
)
Temperature controls how much the model is allowed to deviate from standard behavior.
Lower values will make the model more conservative in its responses, and values like 0
will make the model deterministic (i.e. it will always generate the same output for the same input).
Higher values, like 0.8
will make the model more creative in its responses, meaning it will take more risks and generate more unexpected outputs.
Values above 1
are not recommended, as they can lead to nonsensical outputs, but we allow them for experimentation.
Whether to stream the response or not.
Frequency penalty controls how much the model is allowed to repeat itself.
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of the model repeating the same lines over and over.
Presence penalty controls the model's likelihood of talking about new topics.
Positive values penalize tokens based on whether they have already appeared in the text so far, making the model more likely to introduce new topics.
An alternative to sampling with temperature
, a top_p value of 0.1
means the model will be forced to choose from the tokens that make up the top 10% of the probability distribution, making the output more conservative and less creative.
Low values of top_p
have a similar effect to low values of temperature
(i.e. they make the model more conservative).
While we allow you to set both temperature
and top_p
at the same time, we recommend using only one of them at a time, as they can interfere with each other - this is very likely not what you want to do.
Seed is an arbitrary integer number that allows you to make the model's output deterministic. Using the same seed will make the model generate the same output for the same input - though note that this is not always guaranteed, as different versions of the model/hardware/libraries may generate slightly different outputs.
If you do not provide a seed, we will generate one for you, making the output non-deterministic.
Note that lower temperature
values will tend to make the output more similar, regardless of the seed.