OpenAI-compatible REST API. Bring your existing client, swap the base URL, pick a model. Streaming, tool use, structured output — all of it.
1from openai import OpenAI23client = OpenAI(4 base_url="https://llmrack.com/v1",5 api_key="rl_live_...",6)78stream = client.chat.completions.create(9 model="llama-3.1-8b",10 stream=True,11 messages=[{"role": "user", "content": "Explain RAG in 2 sentences."}],12)1314for chunk in stream:15 print(chunk.choices[0].delta.content or "", end="", flush=True)
Quantized for efficient CPU serving. Switch models by changing a string.
Tiers unlock higher rate limits and throughput, not better rates.