First-Time Setup

QikLM is a professional orchestration layer for your local AI infrastructure. It separates the model runtime from the applications that use it, providing a stable, managed environment for high-performance inference.

Requirements

  • A llama-server build from llama.cpp. You don't need one ahead of time: use Managed Install to let QikLM download an official release straight from the llama.cpp GitHub releases page, or point it at any build you already have.
  • GGUF models, or access to GGUF models you want to download via the built-in HF Explorer.

Unlike bundled apps, QikLM never locks you to one embedded engine. You choose the build, and the engine paths and model storage locations are fully configurable via the Admin Dashboard after the first launch.

Quick Start

Run the QikLM binary to start the orchestration service. By default, QikLM listens on port 8192.

./qiklm
.\qiklm.exe    # Windows

The Web UIs

Admin Dashboard

The mission control center for managing engines, profiles, and your model library.

http://localhost:8192/admin/

PromptUI

Premium chat GUI and coding workspace with persistent projects and multimodal support.

http://localhost:8192/promptui/

Dashboard Features

  • Profiles: Create saved model profiles with per-model runtime settings and CLI flags.
  • Library: Browse and manage local GGUF files with automated discovery.
  • HF Explorer: Search Hugging Face repositories and download models directly into your library.
  • Access Control: Manage API access, LAN binding, and admin authentication.

PromptUI Workspace

PromptUI is a premium coding workspace designed to be useful while you work, not just a place to send messages. The built-in Canvas can open code blocks from a chat, let you edit them inline, preview HTML pages, and send the edited code back into the conversation for another pass.

Search & Context

Ground your models in real-time data using Brave Search with manual, automatic, or always-on modes. Optionally inject dynamic context via HTTP MCP resources.

Organization

Keep chat history searchable and organized by project. Manage persistent work threads and multimodal experiments in dedicated workspaces.

Managing llama.cpp Builds

In QikLM, a build is the local llama-server program that actually runs a model. You can keep up to five different builds registered simultaneously, allowing you to test newer features or machine-specific optimizations without reconfiguring your clients.

Managed Install

QikLM can download, unpack, and register a specific llama.cpp release for you automatically via the Dashboard.

Local Registration

If you already have a binary on disk, use Add Build to register its path without moving any files.

Default vs. Profile Builds

The Default Build is used whenever a profile does not specify a specific engine. You can override this inside individual model profiles, allowing a newer model to use a newer build while keeping your stable models on a daily driver.

Note: Deleting a build entry only removes its registration from QikLM settings; it does not delete the binary from your storage.

How Profiles Work

A profile is a saved configuration for running or reaching a model. Instead of manual terminal flags, you give your setup a name (e.g., fast-chat or vision-large) and let QikLM handle the orchestration.

Model Sources

  • Local GGUF: A model file stored in your QikLM library. These profiles can select specific registered builds.
  • Remote Server: A llama.cpp endpoint already running elsewhere. These profiles bypass local builds as the model is already being served.

Granular Control & Inheritance

Profiles manage GPU layers, context size, threads, and custom CLI flags. Think of your global Engine Settings as the "house settings." A profile can either borrow these defaults or carry its own specific overrides.

Automated Routing

When a client requests a model profile, QikLM automatically starts or switches the managed engine, builds the precise launch command, waits for readiness, and then forwards your request. For remote profiles, QikLM acts as a transparent proxy to the external endpoint.

Package Installation

QikLM is distributed as professional native packages for macOS, Linux, and Windows to ensure seamless background operation.

macOS (.pkg)

The .pkg installer is the simplest option: double-click it and follow the prompts, or install from the terminal. A .dmg is also available on the downloads page if you prefer to drag the binary into place yourself.

sudo installer -pkg qiklm_darwin_arm64.pkg -target /

Linux (.deb / .rpm)

Installing via package manager automatically sets up QikLM as a background systemd service. No manual configuration is required.

sudo dpkg -i qiklm.deb # Debian/Ubuntu sudo rpm -i qiklm.rpm # Fedora/RHEL/SUSE

Windows Service

To run QikLM as a background service on Windows, use the PowerShell New-Service cmdlet from an Administrator terminal.

New-Service -Name "QikLM" `
            -BinaryPathName "C:\path\to\qiklm.exe" `
            -DisplayName "QikLM Orchestrator" `
            -StartupType Automatic

Paths & Variables

QikLM uses platform-standard directories for configuration and data. You can inspect your specific paths using the command line:

qiklm paths

Default Directory Map

OS Config Root Data Root
Linux (User) ~/.config/qiklm/ ~/.local/share/qiklm/
Linux (Systemd) /var/lib/qiklm/config/ /var/lib/qiklm/data/
macOS ~/Library/Application Support/qiklm/ ~/Library/Application Support/qiklm/data/
Windows %AppData%\qiklm\ %LocalAppData%\qiklm\

Environment Variables

  • QIKLM_HOME Override the custom root for all config, data, and state.
  • QIKLM_PROXY_PORT Set the server port (Default: 8192).
  • QIKLM_BIND_MODE Set access mode: local_only, lan, or custom.

Remote Access

QikLM is a web application, which means any device that can reach your host can use the full Admin Dashboard and PromptUI. Switch the bind mode to LAN in Engine Settings and you're done.

LAN / Home Lab

Select LAN bind mode from Engine Settings. QikLM becomes accessible to every device on your local network.

Tailscale / VPN

Already running Tailscale, WireGuard, or ZeroTier? Set bind mode to LAN and QikLM is instantly available across your entire mesh, with no port forwarding, no tunnels, no extra configuration.

LAN Mode (Environment Override)

To start QikLM in LAN mode from the command line, listening on all interfaces:

QIKLM_BIND_MODE=lan ./qiklm

Custom Bind (Advanced)

To bind to a specific IP address (e.g., for a dedicated server or multi-homed setup):

QIKLM_BIND_MODE=custom QIKLM_BIND_ADDRESS=192.168.0.10 QIKLM_PROXY_PORT=8192 ./qiklm

Compatibility API

Any client that speaks the OpenAI or Anthropic standard can talk to QikLM through its stable local endpoint. Point the client at your QikLM base URL, authenticate with your QikLM API key, and you are connected.

Base URL

http://127.0.0.1:8192/v1

Swap 127.0.0.1 for your host's LAN or VPN address to connect from another device.

API Key

sk-qiklm-...

Generate your key in Account Settings. It protects your account when used with external harnesses, and you can regenerate it at any time to instantly revoke old access.

OpenAI Standard

/v1/chat/completions
/v1/models
/v1/responses

Anthropic Standard

/v1/messages

Example: OpenCode

Add QikLM as a provider in your OpenCode config. The same pattern applies to any OpenAI-compatible tool: set the base URL and your API key, then list the models you want to expose.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "qiklm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "QikLM",
      "options": {
        "baseURL": "http://127.0.0.1:8192/v1",
        "apiKey": "sk-qiklm-..."
      },
      "models": {
        "gemma-4-12b-it-Q5_K_L": {
          "name": "Gemma 4 12B Q5"
        }
      }
    }
  }
}

Prefer not to touch config by hand? The one-click OpenCode integration in the Integrations panel writes this for you automatically.

Agents & CLI Integrations

QikLM provides one-click setup flows that bridge local inference with professional agent and developer tools, handling the base URL, API key, and model profiles for you.

Claude Code

Automated environment preparation using the Anthropic-compatible API routes.

Codex

Generates a dedicated QikLM profile for use with codex --profile qiklm.

OpenCode

Writes a QikLM provider block into your OpenCode config so your local models appear instantly in the TUI. No manual JSON required.

Hermes Agent

Writes custom provider and model profiles to instantly route through QikLM.

Beyond these, any other OpenAI- or Anthropic-API-compatible app works with QikLM. Just point it at your base URL and API key as shown in the Compatibility API above.

Hardware Tuning

Advanced optimization for modern compute runtimes and GPU offloading.

In Engine Settings, use the Pre-Run Initialization field to source environmental variables (such as Intel OneAPI / SYCL vars) before the engine boots.

source /opt/intel/oneapi/setvars.sh > /dev/null 2>&1 &&

lmtop Observability

QikLM's native lmtop module provides zero-dependency, real-time GPU telemetry. Monitor VRAM, temperature, and power draw directly within the Admin Dashboard or PromptUI side panel.

Account Recovery

If administrative access is lost, use the CLI recovery tool from the host machine to reset your credentials.

Standard Reset qiklm account reset-password --username admin
Systemd / Managed Path Reset sudo qiklm account reset-password --home /var/lib/qiklm --username admin