Quickstart

LLMRack serves open-weight LLMs over a plain HTTP+JSON API at https://llmrack.com/v1. No SDK required — any HTTP client works.

1. Get an API key

  1. Sign up at llmrack.com/signup (free tier, no card).
  2. Open Dashboard → API Keys.
  3. Click Generate new key, name it, pick permissions, click Generate.
  4. Copy the key from the reveal modal immediately — it's shown once, then only a SHA-256 hash is stored. If lost, revoke and regenerate.
Keys look like rl_live_… (production) or rl_test_… (testing). Treat them like passwords; never commit them to git or ship them in client code.
export LLMRACK_API_KEY="rl_live_..."

2. Send your first request

Plain cURL, Python requests, or Node fetch — no client library needed. The code rail on the right shows all three. Here's cURL:

curl https://llmrack.com/v1/chat/completions \
  -H "Authorization: Bearer $LLMRACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-3-mini",
    "messages": [{"role":"user","content":"hi"}]
  }'

Start with phi-3-mini — it's the fastest model on CPU. Full model list below.

3. Stream tokens

Add "stream": true. The server returns Server-Sent Events — each data: line is one JSON chunk, and the stream terminates with data: [DONE].

1import os, json, requests
2
3with requests.post(
4 "https://llmrack.com/v1/chat/completions",
5 headers={"Authorization": f"Bearer {os.environ['LLMRACK_API_KEY']}"},
6 json={
7 "model": "phi-3-mini",
8 "messages": [{"role": "user", "content": "Haiku about SSDs."}],
9 "stream": True,
10 },
11 stream=True, timeout=60,
12) as r:
13 for raw in r.iter_lines():
14 if not raw or not raw.startswith(b"data: "):
15 continue
16 payload = raw[6:]
17 if payload == b"[DONE]":
18 break
19 chunk = json.loads(payload)
20 delta = chunk["choices"][0]["delta"].get("content", "")
21 print(delta, end="", flush=True)

4. Embeddings

Text embeddings come from nomic-embed — 768 dimensions, 8k context. Use them for semantic search, RAG, clustering.

curl https://llmrack.com/v1/embeddings \
  -H "Authorization: Bearer $LLMRACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed",
    "input": ["the quick brown fox", "hello world"]
  }'

5. Available models

All prices in USD per 1M tokens. Q4_K_M quantization (except Nomic, F16).

Model idParamsContextIn $/1MOut $/1M
phi-3-mini3.8B128k$0.04$0.04
mistral-7b7B32k$0.08$0.10
llama-3.1-8b8B128k$0.09$0.11
qwen-2.5-7b7B32k$0.10$0.12
nomic-embed137M8k$0.02$

Live list at GET https://llmrack.com/v1/models (or the Models page).

6. Authentication

Pass the API key as a bearer token:

Authorization: Bearer rl_live_...

Missing, malformed, revoked, or expired keys return 401 authentication_error.

7. Rate limits

TierRequests / minTokens / dayMonthly
Free
For testing and light experimentation
10 10,000 $0
Pro
For developers and daily use
100 550,000 $15
Business
For production and high-volume usage
500 5,000,000 $65

Hitting a limit returns 429 with Retry-After (seconds). Upgrade at Dashboard → Billing.

8. Error codes

HTTPTypeWhen
400 invalid_request_error Unknown model, malformed body, bad params.
401 authentication_error Missing / bad / revoked API key.
429 rate_limit_exceeded RPM or daily token budget hit.
502 api_error Upstream model error — retry is safe.

9. Use with agents & tools

Anything that supports a custom OpenAI-compatible endpoint works with LLMRack. Configure the tool with:

Base URLhttps://llmrack.com/v1
API keyrl_live_…
Modelphi-3-mini · mistral-7b · llama-3.1-8b · qwen-2.5-7b · nomic-embed
Provideropenai (whenever the tool asks which provider — pick OpenAI, then override the URL)

Open WebUI

Settings → Connections → OpenAI API. Set the endpoint to https://llmrack.com/v1 and paste your key. Models appear automatically.

LibreChat

Add a custom endpoint to librechat.yaml:

endpoints:
  custom:
    - name: "LLMRack"
      apiKey: "${LLMRACK_API_KEY}"
      baseURL: "https://llmrack.com/v1"
      models:
        default: ["llama-3.1-8b", "mistral-7b", "phi-3-mini", "qwen-2.5-7b"]
      titleModel: "phi-3-mini"
      iconURL: "https://llmrack.com/favicon.ico"

Continue (VS Code / JetBrains)

Edit ~/.continue/config.json:

{
  "models": [
    {
      "title": "LLMRack · Llama 3.1 8B",
      "provider": "openai",
      "model": "llama-3.1-8b",
      "apiBase": "https://llmrack.com/v1",
      "apiKey": "rl_live_..."
    },
    {
      "title": "LLMRack · Phi-3 Mini (fast)",
      "provider": "openai",
      "model": "phi-3-mini",
      "apiBase": "https://llmrack.com/v1",
      "apiKey": "rl_live_..."
    }
  ],
  "embeddingsProvider": {
    "provider": "openai",
    "model": "nomic-embed",
    "apiBase": "https://llmrack.com/v1",
    "apiKey": "rl_live_..."
  }
}

Cursor

Cursor → Settings → Models → OpenAI API Key → enable Override OpenAI Base URL, set it to https://llmrack.com/v1, paste your LLMRack key. Add model names (e.g. llama-3.1-8b) in the custom models list.

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://llmrack.com/v1",
    api_key=os.environ["LLMRACK_API_KEY"],
    model="llama-3.1-8b",
    streaming=True,
)

# Works with every LangChain agent, chain, and graph — CrewAI, LangGraph, etc.
for chunk in llm.stream("Summarize RAG in one paragraph."):
    print(chunk.content, end="", flush=True)

LlamaIndex

from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai_like import OpenAILikeEmbedding

Settings.llm = OpenAILike(
    model="llama-3.1-8b",
    api_base="https://llmrack.com/v1",
    api_key=os.environ["LLMRACK_API_KEY"],
    is_chat_model=True,
)
Settings.embed_model = OpenAILikeEmbedding(
    model_name="nomic-embed",
    api_base="https://llmrack.com/v1",
    api_key=os.environ["LLMRACK_API_KEY"],
)

AnythingLLM

Settings → LLM Provider → Generic OpenAI. Base URL https://llmrack.com/v1, API key rl_live_…, model llama-3.1-8b. For embeddings choose Generic OpenAI with model nomic-embed.

n8n

In the OpenAI or OpenAI Chat Model node, open the credential, enable Custom API base URL, set it to https://llmrack.com/v1, and paste your LLMRack key.

Anything else

If the tool accepts a custom OpenAI base URL, LLMRack works. If the option is labeled differently ("OpenAI-compatible server", "OAI proxy", "custom endpoint", etc.), look for the place to override the URL — https://llmrack.com/v1 — then paste your rl_live_… key.

10. Already using OpenAI?

LLMRack's request/response shapes match OpenAI's, so any OpenAI client library works as a drop-in — point it at our base URL and pass an LLMRack key. This is optional; the plain-HTTP examples above are the canonical path.

# if you already have the openai package installed and want to reuse it
from openai import OpenAI

client = OpenAI(
    base_url="https://llmrack.com/v1",
    api_key=os.environ["LLMRACK_API_KEY"],
)

r = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[{"role": "user", "content": "hi"}],
)

Model names differ (llama-3.1-8b vs gpt-4o), but streaming, tool calls, and JSON mode all behave the same way.

Try it without writing code. The Playground lets you hit any model with your key from the browser.