Chat Completions
The Avian API's base URL is https://api.avian.io
Last updated
The Avian API's base URL is https://api.avian.io
Last updated
The number of completions to generate
How many tokens the model is allowed to generate before being stopped.
Note that this only controls how many tokens can be generated (before the response is cut off), not how many will be generated. Setting this to a high value will not make the replies more verbose, and conversely, setting it to a low value will not make the replies more concise.
The maximum value is 128K tokens (128 * 1024 = 131072
)
Temperature controls how much the model is allowed to deviate from standard behavior.
Lower values will make the model more conservative in its responses, and values like 0
will make the model deterministic (i.e. it will always generate the same output for the same input).
Higher values, like 0.8
will make the model more creative in its responses, meaning it will take more risks and generate more unexpected outputs.
Values above 1
are not recommended, as they can lead to nonsensical outputs, but we allow them for experimentation.
Whether to stream the response or not.
Frequency penalty controls how much the model is allowed to repeat itself.
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of the model repeating the same lines over and over.
Presence penalty controls the model's likelihood of talking about new topics.
Positive values penalize tokens based on whether they have already appeared in the text so far, making the model more likely to introduce new topics.
An alternative to sampling with temperature
, a top_p value of 0.1
means the model will be forced to choose from the tokens that make up the top 10% of the probability distribution, making the output more conservative and less creative.
Low values of top_p
have a similar effect to low values of temperature
(i.e. they make the model more conservative).
While we allow you to set both temperature
and top_p
at the same time, we recommend using only one of them at a time, as they can interfere with each other - this is very likely not what you want to do.
Seed is an arbitrary integer number that allows you to make the model's output deterministic. Using the same seed will make the model generate the same output for the same input - though note that this is not always guaranteed, as different versions of the model/hardware/libraries may generate slightly different outputs.
If you do not provide a seed, we will generate one for you, making the output non-deterministic.
Note that lower temperature
values will tend to make the output more similar, regardless of the seed.
Use stop to control when the model should stop generating text.
You may pass up to 4 sequences, and the model will stop generating text when it encounters any of them.
We default to ["<|eot_id|>", "<|eom_id|>", "<|end_of_text|>"]
, which are tokens that the model recognizes as generation boundaries, but you may turn off this behavior by passing an empty array `[], or provide your own sequences.
A list of tools that the model may call in order to obtain additional information or perform actions.
Note that the model does not actually call these tools - it only provides structured data so that you can execute them in whatever library or executor you prefer, and then pass the results of the execution back to the model.
Returns the chat completion response.
Note that the response can be either static or streaming.
You can see examples of both types of responses by selecting Static completion response
or Streaming completion response
from the dropdown menu, which is located below the example box.