All systems normal

v0.9.4 · 5 models

/ llm infrastructure

Open-source LLMs,
served fast. Priced honestly.

Production inference for Llama, Mistral, Qwen, and Phi. One API. Flat monthly plans from $15/mo, free tier, no card required.

Get API key View docs

[01] Quickstart

One endpoint.
Any open model.

Plain HTTP and JSON. Pick a model, pass your key, go. No SDKs to install, no vendor lock-in, no hidden token sampling.

Read the docs Try in playground

No SDK required

Any HTTP client works — curl, requests, fetch.

Streaming by default

Server-Sent Events, tokens as they're generated.

Self-hosted open weights

Llama, Mistral, Qwen, Phi — served on infrastructure we control.

1curl https://llmrack.com/v1/chat/completions \
2  -H "Authorization: Bearer $LLMRACK_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "llama-3.1-8b",
6    "messages": [{"role":"user","content":"Explain RAG in 2 sentences."}]
7  }'

[02] Models

Open models.
One bill.

Quantized for efficient CPU serving. Switch models by changing a string.

8B·128k ctx·~18 tok/s on CPU

INCLUDED ON ALL PLANS

7B·32k ctx·~24 tok/s on CPU

INCLUDED ON ALL PLANS

7B·32k ctx·~20 tok/s on CPU

INCLUDED ON ALL PLANS

3.8B·128k ctx·~280 tok/s on CPU

INCLUDED ON ALL PLANS

INCLUDED ON ALL PLANS

→

[03] Platform

Built for production
from the first token.

Quantized Q4 serving

Q4_K_M weights, CPU-optimized. Warm keep-alive between requests.

Multiple open models

Llama, Mistral, Qwen, Phi. Pick by changing a string.

Flat-rate pricing

Predictable monthly plans with daily token caps. No surprises on the invoice.

Plain HTTP API

REST + JSON. Streaming via Server-Sent Events. No SDK required.

No third-party model hops

All inference runs on infrastructure we operate. Your prompts don't get forwarded to another vendor.

Private by default

Prompts and completions are never stored — only token counts for billing. No training pipeline on your data.

[04] Pricing

Flat monthly plans.
Daily caps, no overage.

Each tier sets a daily token allowance and a per-minute request rate. Hit the cap and requests return 429 until UTC midnight. No surprise invoices.

Free

$0forever

For testing and light experimentation

10,000 tokens / day

10 requests / minute

All open models

1 API key

Community support

Start free

popular

Pro

$15/ month

For developers and daily use

550,000 tokens / day

100 requests / minute

Unlimited API keys

Usage analytics + invoices

All open models

Upgrade

Business

$65/ month

For production and high-volume usage

5,000,000 tokens / day

500 requests / minute

Unlimited API keys

Priority email response

All open models

Upgrade

[05] FAQ

Questions we
actually get asked.

Yes. Point your OpenAI client at https://llmrack.com/v1, swap the API key, and chat.completions.create works unchanged — including stream=True, tools (on models that support them), and response_format for JSON output. n > 1 is the one exception; call the endpoint multiple times instead. See the API capabilities table in /docs for the full field-by-field breakdown.

/ ship today

Start building.
Free tier, no card.

Get API key Read docs

Open-source LLMs,served fast. Priced honestly.

One endpoint.Any open model.

Open models.One bill.

Built for productionfrom the first token.

Quantized Q4 serving

Multiple open models

Flat-rate pricing

Plain HTTP API

No third-party model hops

Private by default

Flat monthly plans.Daily caps, no overage.

Questions weactually get asked.

Start building.Free tier, no card.

Open-source LLMs,
served fast. Priced honestly.

One endpoint.
Any open model.

Open models.
One bill.

Built for production
from the first token.

Flat monthly plans.
Daily caps, no overage.

Questions we
actually get asked.

Start building.
Free tier, no card.