Completions

PreviousChat Completions NextBalance

Last updated 7 months ago

post

/v1/completions

Authorizations

Header parameters

authorizationanyrequired

Example: Bearer avian-XX_XXXXXXXXXXXXXXXXX

Body

modelstring

promptstringrequired

The prompt to generate completions for.

For base models, this may be a free-form string, such as one or more paragraphs of text from a novel (in which case the model may generate a continuation of the text).

For chat or instruct models, the messages should ideally be formatted according to the model's preferred format. For the llama-3.1-instruct family of models, the prompt format is specified in the model card (link).

Note that if you use our chat endpoint (link), we will automatically format the messages for you.

ninteger · min: 1 · max: 8 · default: 1

The number of completions to generate

max_tokensinteger · min: 1 · max: 131072 · default: 4096

How many tokens the model is allowed to generate before being stopped.

Note that this only controls how many tokens can be generated (before the response is cut off), not how many will be generated. Setting this to a high value will not make the replies more verbose, and conversely, setting it to a low value will not make the replies more concise.

The maximum value is 128K tokens (128 * 1024 = 131072)

temperaturenumber · max: 2 · default: 1

Temperature controls how much the model is allowed to deviate from standard behavior.

Lower values will make the model more conservative in its responses, and values like 0 will make the model deterministic (i.e. it will always generate the same output for the same input).

Higher values, like 0.8 will make the model more creative in its responses, meaning it will take more risks and generate more unexpected outputs.

Values above 1 are not recommended, as they can lead to nonsensical outputs, but we allow them for experimentation.

streamboolean · default: false

Whether to stream the response or not.

frequency_penaltynumber · min: -2 · max: 2 · default: 0

Frequency penalty controls how much the model is allowed to repeat itself.

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of the model repeating the same lines over and over.

presence_penaltynumber · min: -2 · max: 2 · default: 0

Presence penalty controls the model's likelihood of talking about new topics.

Positive values penalize tokens based on whether they have already appeared in the text so far, making the model more likely to introduce new topics.

top_pnumber · max: 1 · default: 1

An alternative to sampling with temperature, a top_p value of 0.1 means the model will be forced to choose from the tokens that make up the top 10% of the probability distribution, making the output more conservative and less creative.

Low values of top_p have a similar effect to low values of temperature (i.e. they make the model more conservative).

While we allow you to set both temperature and top_p at the same time, we recommend using only one of them at a time, as they can interfere with each other - this is very likely not what you want to do.

top_knumber

seedinteger · min: -1 · max: 9007199254740991

Seed is an arbitrary integer number that allows you to make the model's output deterministic. Using the same seed will make the model generate the same output for the same input - though note that this is not always guaranteed, as different versions of the model/hardware/libraries may generate slightly different outputs.

If you do not provide a seed, we will generate one for you, making the output non-deterministic.

Note that lower temperature values will tend to make the output more similar, regardless of the seed.

stopstring[] · default: <|eot_id|>,<|eom_id|>,<|end_of_text|>

Responses

application/json

text/plain

cURL

JavaScript

Python

HTTP

curl -L \
  --request POST \
  --url '/v1/completions' \
  --header 'Authorization: Bearer YOUR_SECRET_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "text",
    "prompt": "text",
    "n": 1,
    "max_tokens": 4096,
    "temperature": 1,
    "stream": false,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "top_p": 1,
    "top_k": 1,
    "seed": 1,
    "stop": [
      "<|eot_id|>",
      "<|eom_id|>",
      "<|end_of_text|>"
    ]
  }'

200

401

402

500

Streaming completion response

Static completion response

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":"Hello","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":"!","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" It","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":"'s","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" nice","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" to","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" meet","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" you","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":".","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" Is","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" there","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" something","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" I","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" can","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" help","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" you","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" with","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":",","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" or","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" would","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" you","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" like","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" to","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":" chat","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":"?","logprobs":null,"finish_reason":null}]}

data: {"id":"cmpl-411e459f-cdb0-4f8f-8bc5-b0ecbb608684","object":"text_completion","created":1721936092,"model":"Meta-Llama-3.1-405B-Instruct","choices":[{"index":0,"text":"","logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":1,"completion_tokens":25,"total_tokens":26}}

[DONE]

Returns the chat completion response.

Note that the response can be either static or streaming.

You can see examples of both types of responses by selecting Static completion response or Streaming completion response from the dropdown menu, which is located below the example box.