Dedicated Deployments
Deploy any Huggingface LLM at 3-10x Speeds
Last updated
Deploy any Huggingface LLM at 3-10x Speeds
Last updated
Dedicated deployments allows you to use Avian.io's technology to create production grade deployments for almost any Huggingface LLM. We support over 100 architectures, and can speed up and optimise your LLMs to speeds of up 600 tokens per second.
To use dedicated deployments,
1) Visit https://new.avian.io/dedicated-deployments
2) Choose the model you want to deploy, such as for example meta-llama/Meta-Llama-3.1-8B-Instruct
3) You may choose how many / what kind of GPU you want to deploy with. Otherwise, this will be chosen for you as the most optimal and cost effective configuration for the model you have chosen. Click "Deploy"
4) Retrieve relevant code from "View Code" section or "Chat" with the model on the platform
You can also furnish a private HF token if your repository is gated, or provide a lora adapter to merge into the base model.
Deployments are production grade and come with autoscaling out of the box, by default.
You can view pricing on the Models and Pricing page.