QikLM Docs | Setup, Configuration & API Reference for llama.cpp & vLLM

Requirements

An inference engine: a llama-server build from llama.cpp, or a vLLM install. For llama.cpp you don't need one ahead of time: use Managed Install to let QikLM download an official release straight from the llama.cpp GitHub releases page, or point it at any build you already have. For vLLM, point QikLM at your existing install.
GGUF models for llama.cpp (browse and download them via the built-in HF Explorer), or Hugging Face / safetensors models for vLLM.

Unlike bundled apps, QikLM never locks you to one embedded engine. You choose the build, and the engine paths and model storage locations are fully configurable via the Admin Dashboard after the first launch.

Package Installation

QikLM is distributed as professional native packages for macOS, Linux, and Windows to ensure seamless background operation.

macOS (.pkg)

The .pkg installer is the recommended way to install QikLM on macOS: double-click it and follow the prompts. A .dmg is also available on the downloads page if you prefer to drag the binary into place yourself.

qiklm_darwin_arm64.pkg # Apple Silicon qiklm_darwin_amd64.pkg # Intel

Linux (.deb / .rpm)

Installing via package manager automatically sets up QikLM as a background systemd service. No manual configuration is required.

sudo dpkg -i qiklm.deb # Debian/Ubuntu sudo rpm -i qiklm.rpm # Fedora/RHEL/SUSE

Windows

Download the .exe and run it directly. To run QikLM as a background service, use the PowerShell New-Service cmdlet from an Administrator terminal.

Note for Windows users: Microsoft Edge blocks downloads of new apps (even those that are digitally signed) with no way to bypass it unless you disable SmartScreen. We recommend using Chrome, Brave, or Firefox to download QikLM.

New-Service -Name "QikLM" `
            -BinaryPathName "C:\path\to\qiklm.exe" `
            -DisplayName "QikLM Orchestrator" `
            -StartupType Automatic

The Web UIs

Admin Dashboard

The mission control center for managing engines, profiles, and your model library.

http://localhost:8192/admin/

PromptUI

Premium chat GUI and coding workspace with persistent projects and multimodal support.

http://localhost:8192/promptui/

Dashboard Features

Profiles: Create saved model profiles with per-model runtime settings and CLI flags.
Library: Browse and manage local GGUF files with automated discovery.
HF Explorer: Search Hugging Face repositories and download models directly into your library.
Access Control: Manage API access, LAN binding, and admin authentication.

PromptUI Workspace

PromptUI is a premium coding workspace designed to be useful while you work, not just a place to send messages. The built-in Canvas can open code blocks from a chat, let you edit them inline, preview HTML pages, and send the edited code back into the conversation for another pass.

Search & Context

Ground your models in real-time data using Brave Search with manual, automatic, or always-on modes. Optionally inject dynamic context via HTTP MCP resources.

Organization

Keep chat history searchable and organized by project. Manage persistent work threads and multimodal experiments in dedicated workspaces.

Managing Engine Builds

In QikLM, a build is the local engine program that actually runs a model, either a llama-server binary from llama.cpp or a vLLM install. For llama.cpp you can keep up to five different builds registered simultaneously, allowing you to test newer features or machine-specific optimizations without reconfiguring your clients. vLLM is registered as a single install you point QikLM at.

Managed Install (llama.cpp)

QikLM can download, unpack, and register a specific llama.cpp release for you automatically via the Dashboard.

Local Registration

If you already have a llama.cpp binary or a vLLM install on disk, use Add Build to register its path without moving any files.

Default vs. Profile Builds

The Default Build is used whenever a profile does not specify a specific engine. You can override this inside individual model profiles, allowing a newer model to use a newer build while keeping your stable models on a daily driver.

Note: Deleting a build entry only removes its registration from QikLM settings; it does not delete the binary from your storage.

How Profiles Work

A profile is a saved configuration for running or reaching a model. Instead of manual terminal flags, you give your setup a name (e.g., fast-chat or vision-large) and let QikLM handle the orchestration.

Model Sources

Local GGUF (llama.cpp): A model file stored in your QikLM library. These profiles can select specific registered builds.
Hugging Face (vLLM): A Hugging Face or safetensors model served by your vLLM install.
Remote Server: A llama.cpp or vLLM endpoint already running elsewhere. These profiles bypass local builds as the model is already being served.

Granular Control & Inheritance

Profiles manage GPU layers, context size, threads, and custom CLI flags. Think of your global Engine Settings as the "house settings." A profile can either borrow these defaults or carry its own specific overrides.

Automated Routing

When a client requests a model profile, QikLM automatically starts or switches the managed engine, builds the precise launch command, waits for readiness, and then forwards your request. For remote profiles, QikLM acts as a transparent proxy to the external endpoint.

Paths & Variables

QikLM uses platform-standard directories for configuration and data. You can inspect your specific paths using the command line:

qiklm paths

Default Directory Map

OS	Config Root	Data Root
Linux (User)	`~/.config/qiklm/`	`~/.local/share/qiklm/`
Linux (Systemd)	`/var/lib/qiklm/config/`	`/var/lib/qiklm/data/`
macOS	`~/Library/Application Support/qiklm/`	`~/Library/Application Support/qiklm/data/`
Windows	`%AppData%\qiklm\`	`%LocalAppData%\qiklm\`

Environment Variables

QIKLM_HOME Override the custom root for all config, data, and state.
QIKLM_PROXY_PORT Set the server port (Default: 8192).
QIKLM_BIND_MODE Set access mode: local_only, lan, or custom.

Remote Access

QikLM is a web application, which means any device that can reach your host can use the full Admin Dashboard and PromptUI. Switch the bind mode to LAN in Engine Settings and you're done.

LAN / Home Lab

Select LAN bind mode from Engine Settings. QikLM becomes accessible to every device on your local network.

Tailscale / VPN

Already running Tailscale, WireGuard, or ZeroTier? Set bind mode to LAN and QikLM is instantly available across your entire mesh, with no port forwarding, no tunnels, no extra configuration.

LAN Mode (Environment Override)

To start QikLM in LAN mode from the command line, listening on all interfaces:

QIKLM_BIND_MODE=lan ./qiklm

Custom Bind (Advanced)

To bind to a specific IP address (e.g., for a dedicated server or multi-homed setup):

QIKLM_BIND_MODE=custom QIKLM_BIND_ADDRESS=192.168.0.10 QIKLM_PROXY_PORT=8192 ./qiklm

Compatibility API

Any client that speaks the OpenAI or Anthropic standard can talk to QikLM through its stable local endpoint. Point the client at your QikLM base URL, authenticate with your QikLM API key, and you are connected.

Base URL

http://127.0.0.1:8192/v1

Swap 127.0.0.1 for your host's LAN or VPN address to connect from another device.

API Key

sk-qiklm-...

Generate your key in Account Settings. It protects your account when used with external harnesses, and you can regenerate it at any time to instantly revoke old access.

OpenAI Standard

/v1/chat/completions
/v1/models
/v1/responses

Anthropic Standard

/v1/messages

Example: OpenCode

Add QikLM as a provider in your OpenCode config. The same pattern applies to any OpenAI-compatible tool: set the base URL and your API key, then list the models you want to expose.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "qiklm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "QikLM",
      "options": {
        "baseURL": "http://127.0.0.1:8192/v1",
        "apiKey": "sk-qiklm-..."
      },
      "models": {
        "gemma-4-12b-it-Q5_K_L": {
          "name": "Gemma 4 12B Q5"
        }
      }
    }
  }
}

Prefer not to touch config by hand? The one-click OpenCode integration in the Integrations panel writes this for you automatically.

Agents & CLI Integrations

QikLM provides one-click setup flows that bridge local inference with professional agent and developer tools, handling the base URL, API key, and model profiles for you.

Claude Code

Automated environment preparation using the Anthropic-compatible API routes.

Codex

Generates a dedicated QikLM profile for use with codex --profile qiklm.

OpenCode

Writes a QikLM provider block into your OpenCode config so your local models appear instantly in the TUI. No manual JSON required.

Hermes Agent

Writes custom provider and model profiles to instantly route through QikLM.

Beyond these, any other OpenAI- or Anthropic-API-compatible app works with QikLM. Just point it at your base URL and API key as shown in the Compatibility API above.

Hardware Tuning

Advanced optimization for modern compute runtimes and GPU offloading.

In Engine Settings, the Pre-Run Initialization field runs any shell command before your engine boots, for any setup that needs to happen first. Chain it with && so it runs ahead of the engine launch.

For vLLM, activate the Python environment it is installed in. This matters more than for llama.cpp, which ships as a self-contained binary with no Python environment to source. For example:

source /path/to/.venv/bin/activate &&

For Intel SYCL builds of llama.cpp, source the oneAPI environment. For example:

source /opt/intel/oneapi/setvars.sh > /dev/null 2>&1 &&

lmtop Observability

QikLM's native lmtop module provides zero-dependency, real-time GPU telemetry. Monitor VRAM, temperature, and power draw directly within the Admin Dashboard or PromptUI side panel.

Account Recovery

If administrative access is lost, use the CLI recovery tool from the host machine to reset your credentials.

Standard Reset qiklm account reset-password --username admin

Systemd / Managed Path Reset sudo qiklm account reset-password --home /var/lib/qiklm --username admin