100% Local • Private • Offline
Bring Your Own Binary

The Qikest way to run
Local AI.

Stop babysitting your terminal.

A complete local AI app in one tiny binary. Built-in premium chat UI, a full web console, and bare-metal llama.cpp performance. No CLI, no Docker, no bloated Electron install. It's running in seconds.

One Binary, Zero Dependencies

No electron apps, no Docker, no npm, nor python installs. Boots in seconds.

Any llama.cpp Build

Drop in your own binary, use a fork, or install up to 5 releases directly from QikLM.

Smart Model Profiles

Save your cli flags once. Switch models with a click or let QikLM swap automatically on prompt.

Web UIs & Network Access

Admin dashboard, PromptUI chat, and GPU monitoring from any browser on your network.

Run Qwen 3, Gemma 4, Llama 4, DeepSeek R1, Mistral, Phi-4, GLM-4, and any GGUF model locally.

Download QikLM

Free for personal use.

QikLM on Desktop, Tablet, and Mobile
Click to Enlarge

Professionalize Your Local Inference.

Local AI is powerful but managing it shouldn't feel like a second job. QikLM handles the complexity so you can focus on using your models.

Seamless Agent Integrations

Connect Claude Code, Codex, Hermes Agent, OpenCode, and other OpenAI-compatible tools through the built-in integrations panel.

Bring Your Own Binary (BYOB)

Stop waiting on proprietary launchers to update their embedded engines. Use any llama.cpp version to access the latest features instantly, or let QikLM install a managed release for a quick setup.

100% Offline & Network-Ready

Zero cloud telemetry. QikLM runs on your hardware and serves a Web UI you can reach from any device on your network. Already on Tailscale or a VPN? Manage your local AI from anywhere, with no extra setup.

Orchestration.
Not Configuration.

QikLM sits between your apps and llama.cpp, turning manual terminal work into a persistent background service. Your apps talk to QikLM, and QikLM handles the rest.

Instant Boot

Automatically starts llama.cpp on your first prompt.

Unified Control

Manage models, builds, and profiles from one place.

Lifecycle Sync

Automatic loading and unloading of server processes.

Seamless Swapping

Hot-swap models mid-chat without manual restarts.

lmtop

Native Hardware Observability

Universal GPU Telemetry Engine

Live
81%

VRAM

12.9/16G

49°

Temp

49% Scl

0%

Load

Idle

6W

Power

6% Scl

Custom Architecture lmtop is QikLM's native telemetry engine: built from scratch to deliver zero-dependency, real-time observability for professional AI workflows.

Engine Precision

Engine Control.
Without Limits.

QikLM gives you full command over your model weights. Save configurations as Profiles to instantly load your favorite setups, and use Templates to standardize recurring parameter values across your entire library. No more babysitting terminal switches and scripts.

Reusable Profile Templates

Create templates to instantly seed new model profiles with your preferred defaults instead of entering the same values every time.

Profile-Based Startups

Save your setup once. Configure all the necessary flags you need for each model directly in your profile. You can even choose different llama.cpp builds per model profile.

Full CLI Passthrough

The UI presets cover the common flags, but every profile also has a raw CLI argument field. Pass any llama.cpp flag you need and save it to the profile.

localhost:8192/models/config
Advanced Model Configuration
Engine Configuration Dashboard
Model Library
View Detail
Centralized Storage

Automated Library
Management.

QikLM organizes your weights into a unified model directory, automatically maps LLM files to searchable profiles, and manages your entire collection through a clean, actionable hub.

  • Automated discovery and profile mapping.
  • Reusable Model Templates for faster setups.
  • One-click deletions and library refreshes.
Hugging Face Integration

Explore the
Model Hub.

Easily browse and download your favorite open weights including Llama, Qwen, Gemma, Deepseek, Mistral, and millions more directly into your library without ever leaving the UI.

Direct Search

Universal HF repository index.

Tree View

Inspect quants before downloading.

Open Model Hub
View Detail
PromptUI Workspace
View Detail
Interactive Workspace

PromptUI: The
Built-in Chat Interface.

A premium chat GUI and coding workspace with a built-in Interactive Canvas for live code collaboration and webpage previews. Monitor performance with lmtop gpu monitor directly in the slideout panel.

Workspace Projects

Manage persistent chat histories across dedicated projects.

Documents & OCR

Attach images, PDFs, and Office documents. Built-in OCR extracts text from images and scanned PDFs automatically.

Interactive Canvas

Live-edit code and preview HTML pages directly in the workspace.

Brave Web Search

Optional live search integration to ground models in real-time data.

MCP Resources

Inject dynamic context via user-selected HTTP MCP endpoints.

Expand Your Workflow

Designed to Play Nice.

QikLM isn't just a standalone tool: it's a high-performance orchestration core that powers your favorite local AI frontends, agentic workflows, and any OpenAPI-compatible interface.

Claude Code

  • Native support for Anthropic's Claude Code CLI.
  • Automated environment preparation and model bridging.

Codex

  • Dedicated profile support via the QikLM Responses API.
  • Execute codex --profile qiklm against any local profile.

Hermes Agent

  • Custom provider and model profiles for instant routing.
  • Native compatibility with high-performance local inference.

OpenCode

  • Zero-config model discovery in the OpenCode TUI.
  • Hot-swap profiles without restarting your terminal.

Tons of features built-in

QikLM ships with a massive feature set designed to make local AI management effortless.

Bring Your Own Keys

Need a cloud model for a task? Add your own OpenAI, Anthropic, or OpenAI-compatible key and use it right alongside your local models. Optional, and entirely yours.

Smart Model Profiles

Save common engine configs for every model including GPU layers, context, threads and any cli args you wish to use.

lmtop Gauges

Real-time hardware observability. Monitor VRAM, power, and thermals in live gauges.

Model Library

Turn your weights folder into a proper management hub with file discovery and scanning.

HF Explorer

Search Hugging Face, inspect file trees, and download LLMs directly to your storage.

Auto VRAM Unload

Automatically frees GPU memory after periods of inactivity to save power and resources.

Profile-Based Startups

Stop remembering complex terminal flags. Save your entire configuration into a profile, pick your preferred engine build, and launch with a single click from the UI.

Ecosystem Ready

Seamlessly power Claude Code, Codex, Hermes Agent, OpenCode, OpenClaw, and any OpenAI-compatible completions and responses API with QikLM's orchestration core.

Zero-Bloat Deployment

A single, lightweight binary with no dependencies. Designed to stay fast, stay out of your way, and run anywhere.

Systemd Ready

Professional .deb and .rpm packages that set up QikLM as a background systemd service. No separate daemons required.

Model Swapping

Seamlessly swap backend inference engines mid-chat with any OpenAI-compatible client.

Multi-Binary Support

Swap between different llama.cpp builds and versions per model profile. Maintain multiple engine forks and releases to ensure peak performance for every model.

OpenAI Compatible

Support for v1/completions and v1/responses endpoints for all major modern AI frontends.

Local. Private. Fast.

Get started with QikLM and take full control of your local AI setup.

Download QikLM

Free for personal use.

QikLM Logo