First-Time Setup
QikLM is a professional orchestration layer for your local AI infrastructure. It separates the model runtime from the applications that use it, providing a stable, managed environment for high-performance inference.
Requirements
- A
llama-serverbuild from llama.cpp. You don't need one ahead of time: use Managed Install to let QikLM download an official release straight from the llama.cpp GitHub releases page, or point it at any build you already have. - GGUF models, or access to GGUF models you want to download via the built-in HF Explorer.
Unlike bundled apps, QikLM never locks you to one embedded engine. You choose the build, and the engine paths and model storage locations are fully configurable via the Admin Dashboard after the first launch.
Quick Start
Run the QikLM binary to start the orchestration service. By default, QikLM
listens on port 8192.
./qiklm
.\qiklm.exe # Windows
The Web UIs
Admin Dashboard
The mission control center for managing engines, profiles, and your model library.
http://localhost:8192/admin/
PromptUI
Premium chat GUI and coding workspace with persistent projects and multimodal support.
http://localhost:8192/promptui/
Dashboard Features
- Profiles: Create saved model profiles with per-model runtime settings and CLI flags.
- Library: Browse and manage local GGUF files with automated discovery.
- HF Explorer: Search Hugging Face repositories and download models directly into your library.
- Access Control: Manage API access, LAN binding, and admin authentication.
PromptUI Workspace
PromptUI is a premium coding workspace designed to be useful while you work, not just a place to send messages. The built-in Canvas can open code blocks from a chat, let you edit them inline, preview HTML pages, and send the edited code back into the conversation for another pass.
Search & Context
Ground your models in real-time data using Brave Search with manual, automatic, or always-on modes. Optionally inject dynamic context via HTTP MCP resources.
Organization
Keep chat history searchable and organized by project. Manage persistent work threads and multimodal experiments in dedicated workspaces.
Managing llama.cpp Builds
In QikLM, a build is the local llama-server program that actually runs
a model. You can keep up to five different builds registered simultaneously, allowing you to test
newer features or machine-specific optimizations without reconfiguring your clients.
Managed Install
QikLM can download, unpack, and register a
specific llama.cpp release for you automatically via the Dashboard.
Local Registration
If you already have a binary on disk, use Add Build to register its path without moving any files.
Default vs. Profile Builds
The Default Build is used whenever a profile does not specify a specific engine. You can override this inside individual model profiles, allowing a newer model to use a newer build while keeping your stable models on a daily driver.
Note: Deleting a build entry only removes its registration from QikLM settings; it does not delete the binary from your storage.
How Profiles Work
A profile is a saved configuration for running or reaching a model. Instead of
manual terminal flags, you give your setup a name (e.g., fast-chat or
vision-large) and let QikLM handle the orchestration.
Model Sources
- Local GGUF: A model file stored in your QikLM library. These profiles can select specific registered builds.
- Remote Server: A
llama.cppendpoint already running elsewhere. These profiles bypass local builds as the model is already being served.
Granular Control & Inheritance
Profiles manage GPU layers, context size, threads, and custom CLI flags. Think of your global Engine Settings as the "house settings." A profile can either borrow these defaults or carry its own specific overrides.
Automated Routing
When a client requests a model profile, QikLM automatically starts or switches the managed engine, builds the precise launch command, waits for readiness, and then forwards your request. For remote profiles, QikLM acts as a transparent proxy to the external endpoint.
Package Installation
QikLM is distributed as professional native packages for macOS, Linux, and Windows to ensure seamless background operation.
macOS (.pkg)
The .pkg installer is the simplest option:
double-click it and follow the prompts, or install from the terminal. A .dmg is
also available on the downloads page if you prefer to drag the binary into place yourself.
sudo installer -pkg qiklm_darwin_arm64.pkg -target /
Linux (.deb / .rpm)
Installing via package manager automatically sets up QikLM
as a background systemd service. No manual configuration is required.
sudo dpkg -i qiklm.deb # Debian/Ubuntu
sudo rpm -i qiklm.rpm # Fedora/RHEL/SUSE
Windows Service
To run QikLM as a background service on Windows, use the
PowerShell New-Service cmdlet from an Administrator terminal.
New-Service -Name "QikLM" `
-BinaryPathName "C:\path\to\qiklm.exe" `
-DisplayName "QikLM Orchestrator" `
-StartupType Automatic
Paths & Variables
QikLM uses platform-standard directories for configuration and data. You can inspect your specific paths using the command line:
qiklm paths
Default Directory Map
| OS | Config Root | Data Root |
|---|---|---|
| Linux (User) | ~/.config/qiklm/ |
~/.local/share/qiklm/ |
| Linux (Systemd) | /var/lib/qiklm/config/ |
/var/lib/qiklm/data/ |
| macOS | ~/Library/Application Support/qiklm/ |
~/Library/Application Support/qiklm/data/ |
| Windows | %AppData%\qiklm\ |
%LocalAppData%\qiklm\ |
Environment Variables
-
QIKLM_HOMEOverride the custom root for all config, data, and state. -
QIKLM_PROXY_PORTSet the server port (Default:8192). -
QIKLM_BIND_MODESet access mode:local_only,lan, orcustom.
Remote Access
QikLM is a web application, which means any device that can reach your host can use the full Admin Dashboard and PromptUI. Switch the bind mode to LAN in Engine Settings and you're done.
LAN / Home Lab
Select LAN bind mode from Engine Settings. QikLM becomes accessible to every device on your local network.
Tailscale / VPN
Already running Tailscale, WireGuard, or ZeroTier? Set bind mode to LAN and QikLM is instantly available across your entire mesh, with no port forwarding, no tunnels, no extra configuration.
LAN Mode (Environment Override)
To start QikLM in LAN mode from the command line, listening on all interfaces:
QIKLM_BIND_MODE=lan ./qiklm
Custom Bind (Advanced)
To bind to a specific IP address (e.g., for a dedicated server or multi-homed setup):
QIKLM_BIND_MODE=custom QIKLM_BIND_ADDRESS=192.168.0.10 QIKLM_PROXY_PORT=8192 ./qiklm
Compatibility API
Any client that speaks the OpenAI or Anthropic standard can talk to QikLM through its stable local endpoint. Point the client at your QikLM base URL, authenticate with your QikLM API key, and you are connected.
Base URL
http://127.0.0.1:8192/v1
Swap 127.0.0.1 for your
host's LAN or VPN address to connect from another device.
API Key
sk-qiklm-...
Generate your key in Account Settings. It protects your account when used with external harnesses, and you can regenerate it at any time to instantly revoke old access.
OpenAI Standard
/v1/chat/completions/v1/models/v1/responses
Anthropic Standard
/v1/messages
Example: OpenCode
Add QikLM as a provider in your OpenCode config. The same pattern applies to any OpenAI-compatible tool: set the base URL and your API key, then list the models you want to expose.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"qiklm": {
"npm": "@ai-sdk/openai-compatible",
"name": "QikLM",
"options": {
"baseURL": "http://127.0.0.1:8192/v1",
"apiKey": "sk-qiklm-..."
},
"models": {
"gemma-4-12b-it-Q5_K_L": {
"name": "Gemma 4 12B Q5"
}
}
}
}
}
Prefer not to touch config by hand? The one-click OpenCode integration in the Integrations panel writes this for you automatically.
Agents & CLI Integrations
QikLM provides one-click setup flows that bridge local inference with professional agent and developer tools, handling the base URL, API key, and model profiles for you.
Automated environment preparation using the Anthropic-compatible API routes.
Generates a dedicated QikLM profile for use with
codex --profile qiklm.
Writes a QikLM provider block into your OpenCode config so your local models appear instantly in the TUI. No manual JSON required.
Writes custom provider and model profiles to instantly route through QikLM.
Beyond these, any other OpenAI- or Anthropic-API-compatible app works with QikLM. Just point it at your base URL and API key as shown in the Compatibility API above.
Hardware Tuning
Advanced optimization for modern compute runtimes and GPU offloading.
In Engine Settings, use the Pre-Run Initialization field to source environmental variables (such as Intel OneAPI / SYCL vars) before the engine boots.
source /opt/intel/oneapi/setvars.sh > /dev/null 2>&1 &&
lmtop Observability
QikLM's native lmtop module provides zero-dependency, real-time GPU telemetry. Monitor VRAM, temperature, and power draw directly within the Admin Dashboard or PromptUI side panel.
Account Recovery
If administrative access is lost, use the CLI recovery tool from the host machine to reset your credentials.
qiklm account reset-password --username admin
sudo qiklm account reset-password --home /var/lib/qiklm --username admin