📘
Avian API Documentation
  • GET STARTED
    • Quickstart
    • Dedicated Deployments
    • Key Concepts
    • Models and Pricing
    • Function Calling
    • JSON Mode
  • API REFERENCE
    • Chat Completions
    • Completions
    • Balance
    • Health
    • Specification
Powered by GitBook
On this page
  1. GET STARTED

Dedicated Deployments

Deploy any Huggingface LLM at 3-10x Speeds

PreviousQuickstartNextKey Concepts

Last updated 3 months ago

Dedicated deployments allows you to use Avian.io's technology to create production grade deployments for almost any Huggingface LLM. We support over 100 architectures, and can speed up and optimise your LLMs to speeds of up 600 tokens per second.

To use dedicated deployments,

1) Visit

2) Choose the model you want to deploy, such as for example meta-llama/Meta-Llama-3.1-8B-Instruct

3) You may choose how many / what kind of GPU you want to deploy with. Otherwise, this will be chosen for you as the most optimal and cost effective configuration for the model you have chosen. Click "Deploy"

4) Retrieve relevant code from "View Code" section or "Chat" with the model on the platform

You can also furnish a private HF token if your repository is gated, or provide a lora adapter to merge into the base model.

Deployments are production grade and come with autoscaling out of the box, by default.

You can view pricing on the

Models and Pricing page.
https://new.avian.io/dedicated-deployments
Dedicated Deployments Interface