LiteLLM VPS Docker
LiteLLM is the #1 open-source AI gateway and proxy server that unifies 100+ LLM providers.
LiteLLM is an open-source AI gateway and proxy server that exposes a unified, OpenAI-compatible API for calling 100+ large language model (LLM) providers.
What makes LiteLLM fundamentally different from cloud-managed alternatives like OpenRouter is its self-hosted, transparent architecture. LiteLLM is an MIT-licensed Python library and proxy server that you run entirely on your own infrastructure; no third party sits in the request path, and all your request logs, provider API keys, and spend data stay within systems you control.
LiteLLM is built for developers, AI engineers, and platform teams who need to manage multiple LLM providers at scale.
LiteLLM provides a single OpenAI-style endpoint that automatically translates requests for over 100 different LLM providers. This allows you to swap models by simply changing a name string in your config without rewriting any application code or updating multiple SDKs.
The built-in router manages traffic across multiple deployments with smart features like round-robin distribution and least-latency routing. It ensures high reliability by automatically failing over to secondary providers or retrying with exponential backoff if a primary service hits rate limits.
The proxy tracks every token used to enforce global or per-user monthly budgets and standardizes all usage logs into a PostgreSQL database. You can gain full visibility into spending through detailed cost attribution, ensuring no single project or team exceeds its financial cap.
You can issue scoped virtual keys to different teams with specific model allowlists, rate limits, and expiration dates without exposing master provider credentials. This system supports a full hierarchy with inherited limits, making it easy to manage AI access for internal departments or external partners.
LiteLLM maintains a complete audit trail of every interaction for debugging and compliance, with native support for tools like Langfuse and Datadog. The admin UI provides a real-time view of these logs and includes Swagger documentation to simplify the integration of new product surfaces.
By using Redis for response caching and distributed rate-limit coordination, LiteLLM significantly reduces latency and token costs for repeated queries. This architecture ensures that horizontally scaled deployments stay within global provider limits even under heavy, concurrent traffic.
The platform supports the A2A protocol, allowing you to invoke complex agent systems from LangGraph or Vertex AI through the same unified interface. This transforms the proxy into a centralized orchestration layer that can route requests to both standalone models and autonomous agents.
A comprehensive web suite allows you to manage keys, configure models, and monitor spending through a point-and-click interface. It also includes a built-in chat UI for rapid prototyping, making the gateway accessible to non-developers while serving technical teams with automated API docs.
Licensed under MIT, LiteLLM offers complete data sovereignty with a fully auditable codebase and no hidden per-token fees or usage caps. Since it is self-hosted, your API keys and conversation logs remain entirely under your control, free from third-party SaaS access.
A VPS ensures your LiteLLM instance stays active around the clock, keeping your unified AI endpoint accessible to all your applications at all times - turning your local proxy into a production-grade service.
A VPS provides guaranteed CPU and RAM allocation, ensuring consistent low‑latency performance even when handling high‑volume production workloads with multiple concurrent users.
When you self-host LiteLLM on your own VPS, your request logs, API keys, spend data, and usage patterns stay exclusively on your infrastructure, never passing through third-party cloud services. This is essential for organizations handling sensitive data, intellectual property, or anything subject to GDPR, HIPAA, or other compliance frameworks.
Start with a basic VPS plan and scale resources as your AI usage grows. As you add more team members, integrate additional LLM providers, or increase request volumes, you can upgrade CPU, RAM, and storage without migrating infrastructure. LiteLLM's architecture separates the proxy server from the database and cache, allowing you to scale each component independently as your needs evolve.
A private university implemented LiteLLM to provide privacy-preserving chatbot access to students and employees across multiple departments. Each course or department received its own virtual key with a specific budget, and the university maintained full control over which models were accessible, preventing students from accidentally racking up high bills on expensive models.
Companies building AI-powered applications that cannot tolerate downtime use LiteLLM to configure automatic fallbacks across multiple providers. If OpenAI is rate-limited or experiencing an outage, LiteLLM automatically routes requests to Anthropic Claude or Google Gemini without any application-level changes.
LiteLLM serves as a self-hosted alternative for developers who require OpenRouter's unified API and load balancing without routing sensitive data through third-party cloud services. By deploying on private infrastructure, organizations maintain full data sovereignty while accessing OpenAI-compatible endpoints, fallbacks, and cost tracking entirely under their own control.
Teams utilize LiteLLM to unify cloud LLMs like Anthropic with local models such as Ollama and vLLM behind a single, consistent interface. This hybrid approach allows for routing sensitive tasks to local infrastructure for privacy while leveraging high-performance cloud models, all managed via centralized load balancing and cost tracking.
LiteLLM is an open-source AI gateway and proxy server that exposes a unified. It provides enterprise-grade features like load balancing, automatic fallbacks, cost tracking, virtual keys, and full request logging, all while running entirely on your own infrastructure.
LiteLLM supports over 100 providers including OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5, o1), Anthropic Claude (all versions), Google Gemini, Groq, Cohere, Mistral, Azure OpenAI, AWS Bedrock, Vertex AI, DeepSeek, NVIDIA NIM, vLLM, HuggingFace, Sagemaker, Ollama (local models), and many more. It also supports custom OpenAI-compatible endpoints, so you can add any provider that exposes an OpenAI-style API.
Virtual keys are scoped API keys that you issue from the LiteLLM proxy to your team members, projects, or external partners. Each virtual key can be limited to specific models (allowlist), restricted by rate limits, capped by monthly budgets, and set to expire after a defined period. This allows you to give others access to LLMs through your unified endpoint without exposing your master provider keys, and you retain full visibility and control over their usage and spend.
For the Python SDK (no database), a VPS with 1GB of RAM and 1 vCPU is sufficient. For the full proxy server with PostgreSQL and Redis, at least 4GB of RAM and 2+ vCPUs are recommended. The Docker Compose stack requires approximately 3-4GB of disk space.
See our Cookie Policy
We value your input
Want us to follow up with an answer or a custom quote? Drop your email below. Totally optional.