Plain HTTP and JSON. Pick a model, pass your key, go. No SDKs to install, no vendor lock-in, no hidden token sampling.
1curl https://llmrack.com/v1/chat/completions \2 -H "Authorization: Bearer $LLMRACK_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "model": "llama-3.1-8b",6 "messages": [{"role":"user","content":"Explain RAG in 2 sentences."}]7 }'
Quantized for efficient CPU serving. Switch models by changing a string.
Each tier sets a daily token allowance and a per-minute request rate. Hit the cap and requests return 429 until UTC midnight. No surprise invoices.