ManageLM Documentation
Manage your Linux and Windows servers with natural language — securely, instantly, at scale.
Overview
ManageLM is a remote server management platform. Instead of SSH-ing into servers and running commands manually, you describe what you want in plain English and ManageLM takes care of the rest.
You are running the self-hosted version. The portal runs on your own infrastructure via package installer or Docker.
You are using the SaaS version hosted by ManageLM.
- Portal — The control plane (this web app). Manages accounts, agents, skills, and bridges communication.
- Agent — A lightweight daemon on each managed Linux or Windows server. Receives tasks, uses an LLM to interpret them, executes commands, and reports back.
- Claude — Connects via MCP (Model Context Protocol) to the portal. You talk to Claude, Claude talks to your servers.
How It Works
- You ask Claude to do something on a server, e.g. "Restart nginx on web-01".
- Claude calls a tool on the portal via MCP. The tool is auto-generated from the agent's assigned skills.
- The portal dispatches the task to the target agent over a persistent WebSocket connection.
- The configured LLM (local Ollama/LM Studio, or a cloud provider) interprets the task and generates the shell commands.
- Commands are validated against the skill's allowlist before execution — only explicitly permitted commands can run.
- Results flow back through the agent → portal → Claude → you.
What you can do
Just describe what you need in plain English. Here are examples across skills:
| Category | Example prompt |
|---|---|
| Services | "Restart nginx on web-01 and show me the last 20 log lines" |
| Packages | "Update all packages on production servers" |
| Users | "Add SSH access for Charly on user deploy on pocmail" |
| Security | "Run a security audit on all servers and email me a summary" |
| Access | "Who has sudo on production servers?" |
| Activity | "Run an activity audit on dev and show who logged in today" |
| Files | "Add a server block for api.example.com to nginx on web-01" |
| Firewall | "Open port 8080 on staging servers" |
| Containers | "List all running Docker containers on docker-01 and show which ones use more than 1GB memory" |
| Certificates | "Check TLS certificate expiry on all web servers" |
| Backups | "Show me the last backup status for every agent and which ones are failing" |
| Database | "Show the slow query log for MySQL on db-01" |
| Monitoring | "Which servers have disk usage above 85%?" |
| Multi-server | "Check if chrony is running on all servers, install it where it's missing" |
| LLM | "Pull llama3.2 on the Ollama server and test it with a simple prompt" |
These are not templates — you can phrase requests however you want. The agent interprets intent and adapts to each server's OS (Linux or Windows), package manager, and configuration.
Quick Start
From zero to managing a server with natural language — in under 10 minutes.
What You'll Need
- A Linux server (Ubuntu, Debian, RHEL, Rocky, Alma, Fedora, etc.) or Windows Server you want to manage
- Root/sudo access (Linux) or Administrator access (Windows)
- Python 3.9+ and curl installed (Linux), or Python 3.9+ and PowerShell 7+ (Windows)
- A web browser to access the ManageLM portal
- Create an account — Register on the portal and verify your email.
-
Configure the LLM — Go to Settings → Account. Choose Local LLM (install Ollama and run
ollama pull qwen3.5:9b) or Cloud LLM (enter a provider API key). -
Import Skills — Go to Agent Skills → Catalog and import the skills your agent will need. Start with
system,files,services,packages, andusers. - Install the agent — Click Add Agent in the dashboard, copy the install command, and run it on your server.
- Approve the agent — The portal detects the enrollment automatically. Verify the hostname and click Approve.
- Assign skills — Click on the agent, scroll to Assigned Skills, and assign the skills you imported.
- Connect Claude — Copy the MCP connector details from Settings → MCP & API into Claude Desktop or Claude Code.
- Run your first task — Ask Claude: "Show me the system info on web-01", or use the portal's Run Task button directly.
Create an Account
- Navigate to the portal and click Register.
- Enter your first name, last name, email, and password.
- Check your email for a verification link and click it.
- Log in to the portal. You're now the owner of your account.
Install an Agent
Agents are installed on any Linux or Windows server you want to manage. The install is a single command.
- Log in to the portal and go to My Agents.
- Click Add Agent.
- Optionally select a server group.
- Copy the install command and run it on your server:
Linux
curl -fsSL "https://your-portal/install.sh?token=..." | sh
The Linux installer will:
- Check prerequisites (Python 3.9+, curl)
- Download agent files to
/opt/managelm/ - Install Python dependencies
- Enroll the agent with the portal
- Wait for your approval
- Set up a systemd service that starts automatically
Windows
On Windows, the portal provides a PowerShell install script. Copy it from the Add Agent modal (Windows tab) and run it in an elevated PowerShell session. The Windows installer performs the same enrollment steps and registers the agent as a Windows service.
Approve the Agent
After the install script runs, the agent appears in the portal as pending approval.
- The portal's Add Agent modal will automatically detect the new enrollment and show an approval prompt.
- Verify the hostname and click Approve.
- The agent receives its access token and connects via WebSocket.
- A green Connected indicator appears in the agent list.
You can also approve agents from the agent list by clicking the Approve button on any pending agent.
Set Up the LLM
Each agent uses an LLM to interpret tasks and generate commands. Configure from Settings → Account.
Option 1: Local LLM (Recommended)
Install Ollama or LM Studio for full data privacy — your commands and data never leave your infrastructure. The LLM can run on the agent server itself or on a dedicated machine accessible by your agents.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a recommended model
ollama pull qwen3.5:9b
Ollama listens on http://localhost:11434 by default. If Ollama runs on a separate server, set the LLM API URL to its address (e.g. http://llm-server:11434) in Settings → Account.
Recommended local models
ManageLM agents need an LLM that reliably follows structured output formats (<cmd> tags, <done/> markers). For IT agent workloads — generating shell commands, managing services, parsing logs — models with strong instruction following perform best. We recommend the Qwen 3.5 family for the best balance across all hardware tiers, and the Gemma 4 family (E4B, 26B-A4B MoE, 31B Dense) when you need native multimodal support or the very long 256K-token context window for large log / config analysis.
All VRAM figures below assume 4-bit quantization (Q4), which is the default for Ollama/LM Studio and keeps quality within ~2–5% of full precision while cutting memory by roughly 60%. Add 1–3 GB of overhead for the runtime, KV cache, and typical context — more for long contexts on dense models.
CPU-only servers (8–16 GB RAM)
| Model | Size | RAM | Notes |
|---|---|---|---|
| gemma-4-e4b | E4B (4B effective) | ~5 GB | Gemma 4 edge model — native multimodal (text, image, audio, video), 128K context. Runs on modest CPU or 8 GB class GPU. |
| qwen3.5:9b | 9B | ~7 GB | Best balance of speed and accuracy for CPU-only servers. |
| qwen3.5:4b | 4B | ~4 GB | Lightweight option for constrained servers or simple skills. |
| ministral-3:8b | 8B | ~6 GB | Mistral’s edge model with strong function calling and 128K context. Good alternative to qwen3.5:9b when Mistral’s instruction style fits your skills better. |
GPU servers (16–24 GB VRAM)
| Model | Size | VRAM | Notes |
|---|---|---|---|
| gemma-4-26b-a4b | 26B MoE (3.8B active) | ~18 GB | Mixture-of-Experts — only 3.8B active parameters at inference, so tokens-per-second are close to a 4B model while quality is close to a 26B. 256K context (memory stays modest: ~18 GB at 4K → ~23 GB at 256K). Fits RTX 3090/4090. |
| qwen3.5:27b | 27B | ~17 GB | Excellent quality for complex sysadmin tasks. Fits most GPUs (RTX 3090/4090). |
| qwen3.5:35b | 35B | ~24 GB | Top quality at this tier. Needs RTX 4090 or A5000. |
| mistral-small3.2 | 24B | ~16 GB | Mistral’s small model with strong function calling and instruction following. |
| ministral-3:14b | 14B | ~10 GB | Mid-tier Mistral model — fast tokens-per-second on consumer GPUs (RTX 3080/4070+) with solid tool-use. Leaves headroom for long contexts or parallel skills. |
High-end hardware (48+ GB — Mac Studio, DGX Spark, multi-GPU)
| Model | Size | Memory | Notes |
|---|---|---|---|
| gemma-4-31b | 31B Dense | ~20–40 GB | Google’s flagship dense Gemma 4 — top-tier open-weights quality with a 256K context window. ~20 GB at 4K context, scaling up to ~40 GB when filling the full 256K. Best on 48 GB cards (A6000, RTX 6000 Ada) for long-context work; fits on a single RTX 4090 at short contexts. |
| llama3.3:70b | 70B | ~45 GB | Full precision. Strong tool-use and structured output. Near cloud-LLM quality. |
| qwen3.5:35b | 35B | ~24 GB | Excellent quality with headroom for large context and throughput. |
| mistral-small3.2 | 24B | ~16 GB | Strong instruction following. Efficient for multi-model setups. |
containers or kubernetes, and qwen3.5:9b for simple skills like system or users. This optimizes both quality and throughput.
Option 2: Cloud LLM
Use an external cloud provider instead of running a local LLM. Supported providers:
| Provider | Example models |
|---|---|
| Anthropic (Claude) | claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
| OpenAI (ChatGPT) | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, o3, o4-mini |
| Google (Gemini) | gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview |
| xAI (Grok) | grok-4, grok-4-fast, grok-3, grok-3-mini |
| Groq | llama-3.3-70b-versatile, llama-4-scout-17b-16e-instruct, deepseek-r1-distill-llama-70b |
| Mistral | mistral-large-latest, codestral-latest, ministral-8b-latest |
| DeepSeek | deepseek-chat, deepseek-reasoner |
Select the provider and model from the dropdown, enter your API key, and click Test to verify the connection before saving.
LLM Access Mode
In self-hosted mode, you can choose how agents access the LLM:
- Direct (default) — Each agent calls the LLM directly. The API key is sent to the agent.
- Proxied — Agents route LLM calls through the portal. The API key stays on the portal server and is never sent to agents. This is useful for centralized key management or when agents should not have direct network access to the LLM provider.
Set the access mode from Settings → Account using the Direct / Proxied toggle. Agents with per-agent LLM overrides always use direct access regardless of this setting.
Assign Skills
Skills define what an agent is allowed to do. Without skills, an agent can only perform read-only operations.
- Go to Agent Skills in the sidebar.
- Click Catalog to browse the built-in skills.
- Import the skills you need (e.g. "Systemd Service Management", "Package Management").
- Navigate to your agent's detail page.
- In the Assigned Skills section, click the skill buttons to assign them.
Built-in Skills (31 total)
Skills are available for both Linux and Windows agents. On Linux, skills use shell commands (bash); on Windows, skills use PowerShell-based equivalents. The skill catalog includes platform-appropriate commands for each OS.
| Skill | What it can do |
|---|---|
base | Core read-only utilities (file reading, search, system info, resource usage, network diagnostics). Auto-assigned to all agents. |
system | System info, performance, hostname, timezone, kernel, reboot. |
files | Create, read, write, move, copy, delete files. Permissions, compression, upload/download. |
services | Start, stop, restart services. Systemd on Linux, Windows Services on Windows. Cron jobs, timers, scheduled tasks, process management. |
packages | Install, update, remove packages. Linux: apt, dnf, yum, pacman, zypper, snap. Windows: Chocolatey, winget, MSI. |
users | Create/manage system users, groups, SSH keys, and sudo access. |
network | Configure interfaces, routes, DNS, diagnose connectivity and ports. |
firewall | Manage firewall rules. Linux: UFW, firewalld, iptables, nftables. Windows: Windows Firewall (netsh/PowerShell). |
storage | Disks, partitions, filesystems, LVM, RAID, mounts, and swap. |
security | Auditing, hardening. Linux: fail2ban, SELinux/AppArmor, SSH config. Windows: Windows Defender, BitLocker, Group Policy, Windows Firewall. Intrusion detection. |
certificates | SSL/TLS certificates, Let's Encrypt, CAs, Java keystores, trust stores. |
logs | View, search, and analyze system and application logs (read-only). |
monitoring | System health, resource usage, disk/network I/O, service checks. |
containers | Docker, Podman, Buildah, images, volumes, networks, Compose. |
webserver | Nginx, Apache, Caddy, Tomcat — sites, configs, SSL, reverse proxy. |
webapps | Node.js, Python, PHP, Ruby, Java apps — PM2, Gunicorn, Supervisor. |
database | MySQL, PostgreSQL, SQLite — queries, schemas, users, backups. |
nosql | MongoDB, Redis, Elasticsearch — data operations, backups, clusters. |
git | Git repositories — clone, pull, push, branches, deployment workflows. |
backup | Backup and restore with rsync, tar, cron — files, dirs, databases. |
dns | BIND, Unbound, dnsmasq — zones, records, resolver configuration. |
email | Postfix, Dovecot, queues, aliases, DKIM, SPF, spam filtering. |
vpn | WireGuard, OpenVPN, IPsec — tunnels, peers, keys. |
virtualization | KVM/QEMU, libvirt, LXC/LXD, Proxmox, Vagrant. |
kubernetes | Pods, deployments, services, Helm, scaling, troubleshooting. |
proxy | Squid, Varnish, HAProxy — reverse proxy, caching, load balancing. |
messagequeue | RabbitMQ, Kafka, NATS, ActiveMQ — queues, consumers, messages. |
filesharing | NFS, Samba/SMB, FTP/SFTP, WebDAV. |
ldap | OpenLDAP, FreeIPA, SSSD — directory services, centralized auth. |
automation | Ansible, Terraform, cloud-init — infrastructure as code. |
llm | Ollama, vLLM, llama.cpp — local LLM server and model management. |
services skill on Linux can only run systemctl and journalctl; on Windows, only Get-Service, Restart-Service, etc. The agent rejects any command not on the list. An agent with no skills can only run read-only commands.
Choose Your Interface
ManageLM is not tied to a single tool. You can manage your servers from Claude, ChatGPT, your terminal, the web portal, VS Code, Slack, or n8n — pick whatever fits your workflow.
| Scenario | Claude MCP | ChatGPT | Shell | Portal | VS Code | Slack | n8n |
|---|---|---|---|---|---|---|---|
| Natural language tasks | ✓ | ✓ | ✓ | ✓ | ✓ | Slash cmds | Structured |
| Multi-step reasoning | ✓ Best | ✓ | — | — | ✓ | — | Workflows |
| Scheduled & automated tasks | Via portal | Via portal | ✓ Cron | ✓ Built-in | — | Webhooks | ✓ Native |
| Security audits & reports | ✓ | ✓ | ✓ | ✓ + PDF | ✓ | — | ✓ |
| Fleet operations | ✓ | ✓ | ✓ | ✓ Bulk select | ✓ | ✓ | ✓ |
| CI/CD & scripting | — | — | ✓ Best | ✓ API | — | Alerts | ✓ Best |
| Team collaboration | Per user | Per user | Per user | ✓ RBAC + audit | Per user | ✓ Shared channels | ✓ Shared |
| Offline / air-gapped | ✗ | ✗ | ✓ | ✓ Self-hosted | ✗ | ✗ | ✓ Self-hosted |
Connect Claude
ManageLM integrates with Claude via the Model Context Protocol (MCP). Claude sees your servers as tools it can call.
Claude Pro / Max
- Go to Settings → MCP & API in the portal.
- Find the Claude MCP Connector section.
- In Claude Desktop, go to Settings → Custom Connectors → Add.
- Copy the four fields (Name, Remote MCP URL, OAuth Client ID, OAuth Client Secret) from the portal and paste them into Claude.
This uses OAuth 2.0 with PKCE, the standard MCP authentication method. Custom Connectors require a Claude Pro or Max plan.
Claude Team
On Claude Team plans, the organization admin sets up the connector once, then team members connect it individually with their own ManageLM credentials.
Admin setup:
- Go to Organization Settings → Connectors → Add.
- Select Custom → Web.
- Fill in the same four fields (Name, Remote MCP URL, OAuth Client ID, OAuth Client Secret) from the portal's Settings → MCP & API section and save.
Team members:
- Go to Settings → Connectors in Claude.
- Connect the ManageLM connector — it will authenticate with their own ManageLM credentials.
Claude Free
On Claude's Free plan, Custom Connectors are not available. Instead, you can add the MCP server to your claude_desktop_config.json file using mcp-remote as a bridge with header-based authentication.
- Go to Settings → MCP & API in the portal and expand the Claude Desktop Free Plan section to copy the config.
- Open your
claude_desktop_config.jsonfile:- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
- macOS:
- Paste the following (replace the URL, credentials, and npx path with your values):
{
"mcpServers": {
"ManageLM": {
"command": "/full/path/to/npx",
"args": [
"mcp-remote",
"https://your-portal/mcp",
"--header", "X-MCP-Id:your-client-id",
"--header", "X-MCP-Secret:your-secret"
]
}
}
}
Replace /full/path/to/npx with the actual path to npx on your system (run which npx to find it). The credentials are available in Settings → MCP & API (click "Rotate Secret" if the secret is not yet generated). Save the file and restart Claude Desktop.
Claude Code
You can also configure the MCP server in Claude Code's JSON config using the same format:
{
"mcpServers": {
"ManageLM": {
"command": "npx",
"args": [
"mcp-remote",
"https://your-portal/mcp",
"--header", "X-MCP-Id:your-client-id",
"--header", "X-MCP-Secret:your-secret"
]
}
}
}
What Claude sees
Once connected, Claude gets one tool per skill slug (e.g. system, services, files). Each tool takes two parameters:
target— Agent hostname, server group name, or"all".instruction— Natural language description of the task.
For example, Claude calls the services tool with target: "web-01" and instruction: "restart nginx".
Claude also gets built-in meta-tools:
list_agents— List your servers with status, OS, health metrics, LLM readiness, and groupsget_agent_info— Get detailed info for a single agent (OS, version, health, LLM status, assigned skills, recent tasks)list_agent_skills— See what an agent can do (assigned, group-inherited, and unassigned skills)list_available_skills— Discover catalog skills not yet imported into your accountget_account_info— Check your plan, usage limits, and current consumptionlist_team_members— List account users with roles, permissions, and registered SSH public keyssearch_agents— Filter agents by health metrics, OS, status, group, or free textsearch_inventory— Search system inventory across all agentssearch_security— Search security audit findings across all agentssearch_ssh_keys— Search SSH keys: registered profile keys + deployed keys on servers with identity mappingsearch_sudo_rules— Search sudo privileges across all agentsrun_security_audit— Trigger a security audit and wait for resultsrun_inventory_scan— Trigger an inventory scan and wait for resultsrun_access_scan— Trigger an SSH & sudo access scan and wait for resultsrun_activity_audit— Trigger an activity audit and wait for resultsget_task_status— Check on a running taskget_task_history— View recent commands on a serverget_task_changes— View file changes made by a taskanswer_task— Answer a question from an interactive taskrevert_task— Revert file changes made by a tasksend_email— Send yourself a report or summary email
Note: The tool list is fetched at connection time. If skills are added or removed during a session, Claude won't see the changes until you reconnect (restart Claude Desktop or re-open Claude Code).
Run Tasks
You can run tasks in two ways:
Via Claude (MCP)
Just describe what you want in natural language:
- "Restart the nginx service on web-01"
- "Install htop on all servers in the production group"
- "Check disk usage on db-01"
- "Show the last 50 lines of the postgresql log on db-01"
Via the Portal UI
- Click on an online agent in the dashboard.
- Click the Run Task button.
- Select a skill from the dropdown.
- Type a natural-language instruction describing what you want.
- Click Execute.
Task results appear in the Command History section on the agent detail page and in the Request Log page.
Agent CLI Tools
Three on-host commands ship with the agent and talk to the local daemon over a Unix socket — no network, no portal round-trip. They reuse the same skill gate, command validator, and kernel sandbox as portal tasks, and work offline whenever the local LLM is configured. All three are installed in /opt/managelm/bin/ (Linux) or C:\ProgramData\ManageLM\bin\ (Windows) and also exposed on the PATH.
root / LocalSystem on the managed host. They bypass portal user identity (no per-user RBAC) but still go through the skill's allowed-commands validator and the kernel sandbox, and every task is forwarded to the portal's audit log with source shell.
managelm-shell — Interactive terminal
A natural-language REPL on the managed server. Type what you want, the agent auto-routes it to the best skill, runs it through the sandbox, and streams the answer back.
# Interactive REPL
managelm-shell
# One-shot
managelm-shell -c "install htop and verify"
# Force a specific skill
managelm-shell
> @services restart nginx
- Auto skill routing — the daemon picks the right skill from your phrasing; use
@skillto override. - Multi-step planner — complex requests are auto-decomposed into sequential steps across skills and a numbered execution plan is shown before running.
- Streaming output, tab completion, command history, elapsed time, and rich markdown rendering.
- Follow-ups — type
> …to continue the previous conversation with full context. - Changeset & rollback — file writes are snapshotted;
changeslists them,rollback #Nreverts. - Interactive tasks — the LLM can pause and ask for information only you can provide (domain, password, licence key) and resume with the answer.
- Linux commands marked interactive run in a PTY with the local LLM driving stdin.
Shell tasks show up in the audit log, Reporting, and webhooks just like portal tasks.
managelm-fixit — Diagnose & fix one file
Point it at any misbehaving file. The agent classifies the content, picks the right skill, diagnoses the issue, and proposes a full-file replacement as a colored diff. Apply on y, reject on N.
# Diagnose, show diff, y/N to apply
managelm-fixit /etc/nginx/nginx.conf
# Diagnosis only, no diff
managelm-fixit --explain /etc/postfix/main.cf
# Auto-apply without prompting
managelm-fixit --yes /var/www/app/config.yaml
# Force a skill and add a hint
managelm-fixit @webserver -c "502 after upgrade" /etc/nginx/nginx.conf
- Content-based routing — the skill is chosen from the file's content, not its path, so it works on any text file (configs, scripts, code).
- Atomic writes with owner, group, and mode preserved; rolled back automatically if the post-fix validator fails.
- Same changeset log as shell tasks —
managelm-shell→rollback #Nreverts applied fixes. - Non-zero exit codes distinguish nothing to fix / proposed but not applied / applied / error, suitable for CI or pre-commit hooks.
managelm-review — Read-only review
Where fixit writes, review only reads. Point it at a file or a directory and get a short summary plus a list of findings grouped by severity. Nothing is written to disk.
# Review a single file
managelm-review /etc/ssh/sshd_config
# Review a directory (walks recursively, skips .git, node_modules, …)
managelm-review ./src/
# Only warning + critical
managelm-review --severity warning ./src/
# JSON for CI
managelm-review --format json ./src/
- Findings carry a line number, severity (info / warning / critical) and category (security, bug, style, perf, maintainability).
- Directories are walked safely: noisy dirs skipped, symlinks not followed, hard cap at 20 files, confirmation above 5.
- Exit code
1when findings are present at or above the severity threshold — drop it into a pre-commit hook or CI stage. - Ideal companion to
fixit: review a directory, then fixit the files that matter.
Quick Reports
Quick reports are one-click diagnostic commands available on the Agent Assets page. They let you run common checks on any online agent without writing instructions.
How it works
- Open the Agent Assets page.
- Each agent card shows small icon buttons below the OS info line — one per available report.
- Click an icon to run the report. A modal opens showing a spinner while the agent executes.
- When complete, the modal displays the LLM summary (a readable interpretation) and the raw terminal output.
- Click Copy to copy both summary and output to your clipboard.
agents permission. Reports only appear for skills that are assigned to the agent (directly or via a group).
Available reports
Nine built-in skills include quick reports. Each report runs a pre-built instruction on the agent:
| Skill | Report | What it checks |
|---|---|---|
system | System Summary | Hostname, OS, kernel, uptime, load, memory and disk usage |
system | Top Processes | Top 10 processes by CPU and memory |
services | Service Inventory | All services with status and enabled state (systemd on Linux, Windows Services on Windows) |
services | Failed Services | Services in failed state |
packages | Available Updates | Packages with pending updates |
users | User Accounts | All users and groups with UID, GID, home, shell |
network | Listening Ports | All listening TCP/UDP ports with their process |
security | Security Overview | Listening ports, SSH config, fail2ban status |
containers | Container Status | All containers with name, image, status, ports |
storage | Disk Usage | Disk usage for all mounted filesystems |
logs | Recent Errors | Errors and warnings from the system journal (Linux) or Windows Event Log (last 30 min) |
Active task indicators
Agent cards on both the My Agents dashboard and the Agent Assets page display a red badge with a spinning icon when the agent has tasks currently running. The badge shows the number of active tasks (pending, sent, or executing).
Portal UI Guide
| Page | Purpose |
|---|---|
| My Agents | Dashboard showing all agents, their status, LLM info, skills, active task indicators, and a 7-day task activity chart. Add, approve, search, and bulk-manage agents. |
| Agent Detail | Configure an agent: display name, LLM settings, tags, groups, assigned skills, member access. Run tasks and view command history. |
| Agent Assets | Visual server map with agents organized by collapsible group zones, click-to-expand agent cards with 24h metrics and cloud provider metadata, quick report buttons, security audit, system inventory, SSH & sudo access, activity audit, service dependencies, scheduled PDF reports, and bulk select operations. |
| Agent Skills | Import from the built-in catalog, create custom skills, or import/export skill JSON files. |
| Agent Groups | Organize agents into groups (e.g. "production", "staging"). Groups are used in MCP tool names. |
| Users & Roles | Invite team members, assign roles (admin/member), and configure granular permissions. |
| Monitors | Service monitors — track availability and response time of 43 service types (HTTP, TCP, DNS, SMTP, databases, message brokers, VPNs, and more). Sparkline charts, status badges, alert toggles, categorized catalog, test-before-create. |
| Certificates | Certificate management — issue, renew, and revoke TLS certificates via internal CA or Let's Encrypt. Deploy to agents automatically. CRL generation. Daily auto-renewal sweep. |
| System Backups | System Backups — end-to-end encrypted filesystem backups to your own S3 storage (OVH, AWS, R2, B2, Wasabi, Scaleway, MinIO). Streaming downloads, restore to any agent, detach-on-delete, optional service quiesce for consistent database snapshots. |
| Pentests | Automated penetration testing for public-facing agents using nmap, nuclei, testssl.sh, ffuf, subfinder. Credit-based scans with domain verification. Results feed into the Compliance page. Pro/Business plans. |
| Compliance | Compliance framework mapping — automatically projects security audit and pentest results onto CIS Level 1, CIS Docker, SOC 2, PCI DSS, ISO 27001, NIS2, NIST CSF, and HIPAA. Drift detection with in-app and email alerts. Per-framework evidence PDFs. |
| Connectors | External integrations split into two kinds: Cloud Hosting (Azure, AWS, Google Cloud, VMware, Proxmox, OpenStack — credentials, test connections, sync resources, browse discovered cloud inventory with agent matching) and SIEM Integration (Splunk HEC, Elasticsearch _bulk, generic JSON webhook — agents forward task-completion events directly to the destination, per-agent or inherited from an agent group). |
| Request Log | View all MCP task requests across your account with status and output. |
| Audit Log | View a record of all actions taken in your account. |
| Reporting | Browse and export task execution history with date filters, search, pagination, and PDF export. Requires the perm_reports permission (admin/owner always have access). |
| Settings | Profile, Security (passkeys, MFA, SSH public keys, verified domains, sessions), MCP & API (credentials, IP whitelist, API keys, webhooks), PKI & CA (internal CA setup, Let's Encrypt account, DNS-01 providers, certificate defaults), S3 Backups (provider, bucket, credentials, orphan cleanup), Account (plan, LLM defaults, danger zone). |
Skills
Skills are the core security and capability model. Each skill defines:
- Operations — Named capabilities (e.g.
restart,install,list) that describe what the skill can do. - Allowed commands — The exact shell commands the agent can execute (e.g.
systemctl,apt). - System prompt — Instructions for the LLM on how to perform operations.
Management Hints
Each skill assignment (on an agent or a group) supports management hints — free-text contextual instructions injected into the LLM system prompt as an ADMINISTRATOR HINTS block. Use hints to provide server-specific or group-wide context that helps the LLM do its job:
- Custom paths: "PostgreSQL 16 data dir is /data/pg16, config in /etc/postgresql/16/"
- Port overrides: "Nginx runs on port 8080 behind HAProxy"
- Conventions: "Always use sudo -u postgres for database operations"
- Environment notes: "This is a staging server. Safe to restart services during business hours."
Hints can be set at two levels:
| Level | Where to set | Scope |
|---|---|---|
| Per-agent | Agent detail → expand skill → Management Hints | This skill on this specific agent |
| Per-group | Agent Groups → expand skill → Management Hints | This skill on all agents in the group |
Direct per-agent skill assignments take priority over group-inherited ones (including their hints).
Skill Definition Example
Below is an example of a Linux skill definition. Windows skills follow the same structure but with PowerShell cmdlets in allowed_commands.
{
"description": "Manage systemd services",
"operations": [
{
"name": "restart",
"description": "Restart a systemd service"
},
{
"name": "status",
"description": "Get status of a service including recent logs"
}
],
"allowed_commands": ["systemctl", "journalctl"],
"system_prompt": "You are a Linux sysadmin..."
}
Operations are instruction-based — each operation has only a name and description. They describe capabilities for documentation and AI context, not structured parameter schemas. The agent LLM interprets the user's natural-language instruction to determine what commands to run.
Skill Combinations
Many real-world management tasks span multiple skills. Each skill controls a specific domain — when an operation touches several domains, you need all the relevant skills assigned to the agent.
Foundation skills
These five skills are used by almost every management workflow. Consider assigning them to all agents as a baseline:
| Skill | Why it's foundational |
|---|---|
system | System info, hostname, timezone — needed to understand what you're working with. |
files | Read/write config files, set permissions — almost every change touches a file. |
services | Start/stop/restart daemons, manage cron — most installations end with a service reload. |
packages | Install software — any new capability starts with installing a package. |
users | Create accounts, manage SSH keys, sudo — many services need a dedicated user. |
Common multi-skill workflows
Below are typical management tasks and the skills they require. Each example shows what you'd ask Claude and which skills are involved.
Create a new system user with SSH access
"Create user deploy with a home directory, add their SSH key, and set them up with sudo access for systemctl"
| Step | Skill needed |
|---|---|
| Create user account and group | users |
| Create home directory and set ownership | files |
| Add SSH authorized key | users |
| Configure sudoers entry | users |
Skills: users + files
Install and configure Nginx with SSL
"Install nginx, create a site for example.com with Let's Encrypt SSL, and open port 443 in the firewall"
| Step | Skill needed |
|---|---|
| Install nginx package | packages |
| Create site config file | webserver |
| Obtain SSL certificate via certbot | certificates |
| Enable the site and reload nginx | webserver |
| Open ports 80/443 in the firewall | firewall |
Skills: packages + webserver + certificates + firewall
Deploy a Node.js application
"Clone the repo from GitHub, install dependencies, set up a PM2 process, and configure nginx as a reverse proxy"
| Step | Skill needed |
|---|---|
| Create app user and directory | users + files |
| Clone the Git repository | git |
| Install Node.js and npm dependencies | webapps |
| Start the app with PM2 | webapps |
| Create nginx reverse proxy config | webserver |
| Set up SSL certificate | certificates |
Skills: users + files + git + webapps + webserver + certificates
Set up a PostgreSQL database server
"Install PostgreSQL 16, create a database and user for my app, configure backups, and open port 5432 only from 10.0.0.0/24"
| Step | Skill needed |
|---|---|
| Install PostgreSQL packages | packages |
| Start and enable the service | services |
| Create database and DB user | database |
| Edit pg_hba.conf for network access | files |
| Set up a pg_dump cron job | backup |
| Open port 5432 for the subnet | firewall |
Skills: packages + services + database + files + backup + firewall
Docker Compose deployment
"Create a docker-compose.yml for my app stack, start it, and check the container logs"
| Step | Skill needed |
|---|---|
| Create project directory and compose file | files |
| Start compose stack | containers |
| View container logs | containers |
| Open ports in firewall (if needed) | firewall |
Skills: files + containers + firewall
Security hardening
"Harden SSH (disable root login, key-only auth), set up fail2ban, and configure the firewall to allow only SSH and HTTPS"
| Step | Skill needed |
|---|---|
| Edit sshd_config | security |
| Restart sshd | services |
| Install and configure fail2ban | security |
| Set firewall rules (allow 22, 443 only) | firewall |
| Review auth logs | logs |
Skills: security + services + firewall + logs
Set up WireGuard VPN
"Install WireGuard, generate keys, configure a tunnel to 10.0.1.0/24, and open UDP port 51820"
| Step | Skill needed |
|---|---|
| Install WireGuard package | packages |
| Generate keys and create config | vpn |
| Enable IP forwarding (sysctl) | network |
| Open UDP 51820 in firewall | firewall |
| Start and enable the WireGuard service | services |
Skills: packages + vpn + network + firewall + services
Skill assignment strategies
Use server groups to assign skill sets by server role, so you don't have to configure each agent individually:
| Server role | Recommended skills |
|---|---|
| Web server | system, files, services, packages, users, webserver, certificates, firewall, logs, monitoring |
| App server | system, files, services, packages, users, webapps, git, logs, monitoring |
| Database server | system, files, services, packages, database, backup, firewall, storage, logs, monitoring |
| Docker host | system, files, services, packages, containers, network, firewall, storage, logs, monitoring |
| Minimal / read-only | system, logs, monitoring (no write skills — agent can only read) |
webserver. A web server doesn't need database. Fewer skills = smaller attack surface.
Policy Rulesets
Rulesets are short markdown policy snippets attached to agents (directly or via groups). Every attached ruleset is concatenated and injected into the LLM system prompt as an unconditional POLICY RULES block — applied to every task, regardless of which skill runs.
Where management hints are advisory context scoped to a single skill ("PostgreSQL data dir is /data/pg16"), rulesets are cross-skill constraints that stay in force for the whole task ("Never restart services between 09:00 and 18:00 UTC", "Never edit files under /etc/pam.d without prior approval"). The prompt includes an explicit refusal rule: if a request would violate a listed policy, the agent refuses instead of executing.
Managing Rulesets
Go to Agent Skills → Rules. Each ruleset has:
- Slug — stable identifier (e.g.
change-window,pii-handling). - Name — display label.
- Content — markdown policy text, capped at 4 KiB per ruleset.
Attaching Rulesets
| Level | Where | Scope |
|---|---|---|
| Per-agent | Agent detail → Rulesets | This agent only |
| Per-group | Server Groups → Rulesets | All agents in the group |
Rulesets accumulate across attachments — an agent gets the union of everything attached directly plus everything inherited from every group it belongs to (deduplicated by ruleset id). Changes push to affected agents immediately over WebSocket; no restart required.
skills permission (same gate as creating custom skills).
Server Groups
Groups let you organize agents logically (e.g. by environment, role, or location).
- An agent can belong to multiple groups.
- Groups can be used as MCP targets (e.g.
target: "production"runs the task on all agents in the group). - You can control which team members can see which groups via user group access.
- Skills assigned to a group are inherited by all agents in that group (shown as read-only "(via group)" on the agent detail page).
Create and manage groups from the Agent Groups page. Assign agents to groups from the agent detail page or the groups page.
Group-level skill configuration
When assigning skills to a group, you can configure per-skill settings that apply to all agents in the group:
- Management hints — Contextual instructions injected into the LLM prompt. Use for group-wide conventions (e.g. "All webservers use /var/www as document root. Nginx config in /etc/nginx/sites-enabled/").
- LLM model override — Use a specific model for a skill across all agents in the group.
Click the chevron next to a skill in edit mode to expand the configuration panel.
Secrets
Each agent has a local secrets.txt file (/opt/managelm/secrets.txt on Linux, C:\ManageLM\secrets.txt on Windows). This file stores sensitive values that commands might need.
# Example secrets.txt
DB_USER=myapp
DB_PASS=s3cret
API_KEY="my-api-key"
$DB_PASS), never the actual values. Secrets never leave the server.
LLM Configuration
The LLM is configured from Settings → Account:
- Local LLM (Recommended) — Ollama or LM Studio, running on the agent server or a dedicated LLM host accessible by your agents. Full data privacy.
- Cloud LLM — External provider (Claude, ChatGPT, Gemini, Grok, Groq, Mistral, DeepSeek). Select from a dropdown, pick a model, and enter your API key. Use the Test button to validate the key.
For both Local and Cloud LLM, you can choose the access mode:
- Direct — Agent calls the LLM directly (default).
- Proxied — Agent routes LLM calls through the portal. The API key stays on the portal and is never exposed to agents.
Configuration hierarchy
LLM settings can be overridden at multiple levels (highest priority first):
| Level | Where to set | Use case |
|---|---|---|
| Per-skill override | Agent detail → expand skill → LLM Model Override | Use a specific model for complex skills |
| Per-agent override | Agent detail → Edit → "Override for this agent" | Agent needs a different LLM (local or cloud) |
| Account default | Settings → Account | Default for all agents |
The per-skill config panel also includes management hints for providing contextual instructions to the LLM.
Per-agent overrides offer Local LLM or Cloud LLM options and always use direct access. Agents inherit from the account default unless explicitly overridden.
Default values if nothing is configured:
- LLM API URL:
http://localhost:11434(Ollama default) - LLM Model:
llama3.2(we recommendqwen3.5:9b— see model recommendations)
Users & Roles
ManageLM supports team collaboration with role-based access control.
Roles
| Role | Access |
|---|---|
| Owner | Full access. Cannot be removed. One per account. |
| Admin | Full access. Can invite users, manage permissions, edit settings. |
| Member | Limited access based on permissions. Only sees assigned agents and groups. |
Member Permissions
| Permission | Grants access to |
|---|---|
agents | Approve, delete, configure agents and assign skills |
groups | Create, rename, delete groups and assign agents |
skills | Create, import, edit, and delete skills |
logs | View task logs and MCP activity |
reports | Access the reporting dashboard and export reports |
MCP Visibility
All users (including owners and admins) only see agents via MCP that are:
- Explicitly assigned to them (agent detail → Assigned Users), or
- In a group they have access to (Users → user group access).
This ensures MCP access is always explicitly granted, regardless of role.
Skill Restrictions
For delegated admin members, you can optionally block specific skills from being invoked. On the Users & Roles page, expand the Skill restrictions row under a member's permissions and click Edit to add skills to the member's blocklist.
- Empty blocklist — no restriction; the member can invoke any skill the target agent carries.
- One or more skills blocked — the member may invoke every skill except the listed ones. This is a deny-list: new skills added to the catalog later are automatically available until you explicitly block them.
baseand any skill flagged asrequiredcannot be blocked — they underpin every other skill, so the picker filters them out and the enforcement layer always allows them as a safety net.- The blocklist applies to every entry point: REST API, MCP tools, chat modal, and resumed / follow-up tasks. For members with any blocklist set, the chat modal hides the
autooption (the agent's LLM planner picks the skill at runtime with no visibility into the blocklist, so auto-routing could silently escape it). - Owners and admins are exempt — the editor is hidden for them and the check returns true regardless.
- On the Agents and Server Groups pages, blocked skills render faded with a red restricted badge for members whose blocklist covers them. All admin actions (edit, remove, configure hints) still work normally — the fade is a visual hint only, not a permission gate.
Skill restrictions are a pure portal-side filter — the agent still ships its full effective skill set (agent + group-inherited), and task execution logic is unchanged. Use permissions to gate management actions (creating agents, editing groups) and skill restrictions to gate operational ones (running sensitive skills on agents).
Enforcement matrix
Where the blocklist is checked, and what happens when the caller is restricted:
| Entry point | Explicit skill | Auto-routing | Resumed follow-up / answer |
|---|---|---|---|
POST /api/tasks | Blocklist check | Rejected for restricted users | — |
POST /api/tasks/:id/follow-up | Blocklist check | Rejected | Blocklist check |
POST /api/tasks/:id/answer | Blocklist check | Rejected | Blocklist check |
| MCP skill tool invocation | Blocklist check | — | — |
MCP answer_task | Blocklist check | Rejected | Blocklist check |
managelm-shell on the managed host | Exempt (local root; no portal user identity) | Exempt | — |
| Portal-initiated scans (security, inventory, SSH & sudo, activity) | Exempt (not skills) | — | — |
Rejected tasks return the same error text as “skill not assigned to agent” by design, so the reason isn't hinted at to the caller.
Inviting Users
- Go to Users & Roles.
- Click Invite User.
- Enter their name and email, select a role, and set permissions.
- They'll receive an email with an invitation link.
Passkeys & MFA
ManageLM supports WebAuthn passkeys for secure authentication.
- Register a passkey from Settings → Security → Passkeys.
- Enable MFA to require a passkey after password login.
- Passkeys can also be used for passwordless login from the login page.
You can register multiple passkeys (e.g. fingerprint + security key) and name them for easy management.
API Keys
API keys allow programmatic access to the portal API for automation and integrations.
- Go to Settings → MCP & API.
- Enter a name, select permissions (Agents, Logs, Skills, Groups, Reports), and optionally set an expiration (30, 90, 180, or 365 days).
- Click Create Key and copy the key (starts with
mlm_ak_). It's only shown once.
Use the key in the Authorization header:
Authorization: Bearer mlm_ak_...
Each key's effective permissions are the intersection of the key's permissions and the creating user's permissions. If a user is later downgraded, their keys lose access accordingly. Expired keys are automatically cleaned up.
OAuth App Credentials (OpenAI GPT, etc.)
For integrations that require OAuth 2.0 (like OpenAI GPT Actions), set OAUTH_APP_CLIENT_ID and OAUTH_APP_CLIENT_SECRET in your .env file. These identify the application — each user still authenticates individually with their own ManageLM credentials. See the Self-Hosted Docker guide for details.
Security Model
Defense in depth
- Command allowlist — Skills define exactly which commands an agent can run. Enforced in code, not prompts.
- Destructive command guard — Even for allowed commands, the agent blocks catastrophically dangerous argument combinations:
rmtargeting protected root directories (/,/etc,/usr, etc.),ddwriting to block devices,mkfs,--no-preserve-root, andfind -delete. - Kernel sandbox (opt-in, Linux only) — Landlock + seccomp-bpf confine command subprocesses at the kernel level. Even if a command passes all Python-level checks, the kernel blocks writes outside allowed paths and dangerous syscalls.
- Read-only by default — Agents with no skills (or skills with empty allowlists) can only run safe read-only commands.
- Outbound-only connections — Agents connect to the portal. No inbound ports needed.
- Ed25519 task signing — Every task dispatched to an agent is cryptographically signed. Agents verify the signature before execution.
- Secrets isolation — Secrets stay on the server. The LLM only sees variable names, never values.
- Hash-only storage — Passwords, tokens, and API keys are stored as hashes.
- Rate limiting — Login, registration, and password reset endpoints are rate-limited.
- IP whitelist — Optional CIDR-based IP whitelist for MCP connections.
- Execution limits — Max 10 LLM turns per task, 120s timeout per command, 8000 char output limit.
Always-allowed commands (read-only)
The base skill is auto-assigned to every agent and provides a broad set of read-only commands:
cat head tail less more ls tree grep egrep fgrep find locate wc sort uniq
awk sed cut tr diff comm column paste tac xargs file stat md5sum sha256sum
sha1sum readlink basename dirname realpath uname hostname whoami id uptime
date timedatectl lsb_release arch nproc getconf dmesg last lastlog w who
df du free lsblk lscpu lsmem vmstat iostat top ps pgrep lsof fuser
ip ss netstat dig nslookup host ping traceroute curl wget nc
echo printf which type test true false yes seq sleep cd pwd
Even if the base skill is somehow missing, agents fall back to a minimal safe set: cat head tail ls grep find wc sort echo printf test true false cd pwd which.
Kernel Sandbox
The kernel sandbox is available on Linux agents only. It adds Linux-native confinement to command subprocesses using Landlock and seccomp-bpf. It is opt-in per skill and disabled by default. Each layer can be enabled independently. Windows agents do not use the kernel sandbox.
How it works
When enabled on a skill, every command subprocess runs inside a kernel-enforced sandbox applied via preexec_fn after fork() but before exec(). The agent process itself stays unrestricted — only the command is confined.
Command from LLM
|
+- Layer 1: Injection blocking (Python — blocks $(), eval, etc.)
+- Layer 2: Binary allowlist (Python — must be in allowed_commands)
+- Layer 3: Destructive argument guard (Python — blocks rm -rf /, dd of=/dev/sda)
|
v subprocess.run(preexec_fn=sandbox)
|
+- Layer 4a: Landlock (Kernel — filesystem path confinement)
+- Layer 4b: seccomp-bpf (Kernel — syscall blocklist)
Landlock (filesystem confinement)
Restricts which filesystem paths the subprocess can read, write, and execute from. Uses Linux Landlock LSM (requires kernel 5.13+).
| Access | Default paths | Purpose |
|---|---|---|
| Read | / (everything) | Commands can read system state |
| Write | /etc, /var, /tmp | Config edits, logs, temp files |
| Execute | / (everything) | allowed_commands is the binary gate |
Everything outside the configured write paths is read-only at the kernel level — no userspace bypass possible. File uploads also enforce write paths via Python-level path validation using the same config.
seccomp-bpf (syscall filtering)
Blocks dangerous syscalls that no legitimate agent task should need. Returns EPERM (not kill) for graceful error handling.
| Category | Blocked syscalls |
|---|---|
| Filesystem root | mount, umount2, pivot_root, chroot, move_mount, fsopen, fsconfig, fsmount, fspick, open_tree |
| System control | reboot, kexec_load, kexec_file_load |
| Kernel modules | init_module, finit_module, delete_module |
| Swap | swapon, swapoff |
| Exploit primitives | ptrace, bpf, userfaultfd, perf_event_open |
| System identity | settimeofday, clock_settime, sethostname, setdomainname |
Enabling the sandbox
- Open the Skills page and edit a skill.
- Go to the Sandbox tab.
- Toggle Landlock and/or seccomp-bpf on.
- Customize write paths or blocked syscalls if needed.
- Click Save Changes.
The sandbox is pushed to agents automatically on save. Agents on kernels older than 5.13 (Landlock) or 3.17 (seccomp) gracefully degrade — the sandbox is skipped with a log warning, and commands run unrestricted.
Skill configuration (JSON)
{
"sandbox_landlock": {
"read_paths": ["/"],
"write_paths": ["/etc", "/var", "/tmp", "/opt/myapp"],
"exec_paths": ["/"]
},
"sandbox_seccomp": ["mount", "reboot", "ptrace", "init_module", "..."]
}
Each key is independent — use one or both. Absent key means that layer is off. Catalog skills include recommended sandbox templates that you can use as a starting point.
Requirements
- Landlock: Linux 5.13+ with Landlock LSM enabled (default on most modern distros).
- seccomp-bpf: Linux 3.17+ with
CONFIG_SECCOMP_FILTER=y(enabled by default). - Architecture: x86_64 and aarch64 supported.
- No kernel config needed — no kernel modules, no extra packages, pure
ctypessyscalls.
Security Audits
ManageLM includes a built-in security audit and compliance engine that scans your agents for misconfigurations, vulnerabilities, and hardening issues. Audits are fully deterministic — no LLM required. Check commands are defined on the portal and executed on the agent in a read-only sandbox.
How it works
- Trigger — From the Agent Assets page (per agent), the Compliance dashboard (fleet-wide), or via MCP.
- Scan — The agent runs a set of read-only checks on the host.
- Report — Each finding includes a severity, an explanation of the risk, a suggested fix, and a mapping to compliance frameworks (CIS, PCI DSS, HIPAA, ISO 27001, NIS2, NIST CSF, SOC 2). A compliance score (0–100) reflects the overall posture. Installed packages are also matched against known vulnerabilities (see below).
- Results — Findings appear in the Agent Assets audit view and the Compliance dashboard. You receive an in-app notification when the audit completes.
Server context
Each compliance rule has separate severity ratings for public and private servers:
- Public (internet-facing) — stricter ratings. SSH root login = critical, missing firewall = critical.
- Private (internal network) — relaxed ratings. SSH root login = low, missing firewall = low, missing SELinux = low.
What is checked
| Check | What it inspects |
|---|---|
| SSH & RDP config | Root login, password vs. key authentication, retry limits, X11 forwarding, RDP Network Level Authentication. |
| Listening ports | Open TCP and UDP sockets on all interfaces. |
| Firewall | Host firewall status and rules (UFW, firewalld, nftables, iptables, or Windows Firewall profiles). |
| User accounts | Login-enabled users, UID 0 / local administrators, guest account, service accounts. |
| Password policy | Minimum length, complexity, lockout threshold. |
| Windows hardening | UAC enabled, cleartext credential storage disabled (WDigest), automatic login disabled. |
| File permissions | World-writable files, SUID binaries, shadow file readability. |
| Password hashing | Password hashes flagged if they use weak algorithms (MD5 or older). |
| Patch posture | Pending security updates, automatic-update service enabled, pending reboot after kernel or library updates. |
| Installed packages | Full package inventory feeding the vulnerability scan. |
| Authentication events | Failed login attempts in the last 24 hours. |
| Audit & event logging | Audit daemon (Linux) or Advanced Audit Policy (Windows); PowerShell script-block logging. |
| Endpoint protection | Mandatory access control (SELinux / AppArmor) or Windows Defender antivirus including signature freshness. |
| Time synchronization | System clock synchronized via NTP. |
| Kernel hardening | IP forwarding, ICMP redirect handling, reverse-path filtering, ASLR, SUID core dumps. |
| Brute-force protection | Fail2ban status and active jails. |
| TLS/SSL | Weak protocols (SSLv3, TLSv1.0/1.1) and weak ciphers (RC4, DES, NULL, EXPORT, MD5) rejected on all listening services. |
| Certificates | TLS certificate expiry with days remaining. |
| SMB hardening | SMB signing required, legacy SMB1 protocol disabled. |
| Network exposure | LLMNR (legacy name resolution) disabled on Windows. |
| Disk encryption | BitLocker protection on OS and fixed data volumes (Windows). |
| Scheduled tasks | System and per-user cron jobs. |
| SSH authorized keys | SSH key-based access across all users. |
| Docker | Privileged containers, socket exposure, containers running as root. |
| Vulnerability scan | Installed packages matched against known CVEs (see next section). |
Vulnerability scanning
As part of every security audit, ManageLM checks each agent's installed packages against a public vulnerability database and reports any known CVEs that apply to the installed versions. Nothing to install, nothing to configure.
- Coverage — All major Linux distributions (Debian, Ubuntu, Red Hat, Rocky, AlmaLinux, SUSE, openSUSE, Alpine, and others) plus language package managers (Python, npm, Go, Rust, Ruby, Java, .NET, PHP).
- Actively exploited — Vulnerabilities listed in CISA's Known Exploited Vulnerabilities catalog are automatically marked Critical and flagged as "KEV — actively exploited".
- Compliance impact — CVE findings contribute to the agent's compliance score and feed the patch-management controls of CIS, NIST CSF, NIS2, SOC 2, ISO 27001, and PCI DSS.
- Fix suggestions — Each finding includes the package name, installed version, CVE ID, a link to the advisory, and the exact upgrade command for the host's package manager.
Severity levels
| Level | Meaning |
|---|---|
| Critical | Immediate action required — actively exploitable or dangerous misconfiguration. |
| High | Significant risk — should be addressed promptly. |
| Medium | Moderate risk — recommended to fix. |
| Low | Minor issue or informational finding. |
| Pass | Check passed — no issue found. |
Findings
Each finding includes:
- Category — The area being checked (e.g. SSH, firewall, filesystem, users).
- Title — A short description of the check.
- Explanation — Details on what was found and why it matters.
- Remediation — A recommended fix for the issue.
Automated remediation
You can select one or more findings and click Remediate to have the agent automatically fix them. This requires:
- The Security & Hardening skill to be assigned to the agent.
- The agent to be online.
Remediation creates a task that uses the security skill and the agent's LLM to intelligently apply the recommended fixes. The agent backs up configuration files before making changes and validates them before restarting services.
PDF export
Click the Security button at the top of the Agent Assets page to download a fleet-wide security audit report. The PDF includes a summary bar with issue counts by severity, detailed findings with explanations and remediation steps, and a list of passed checks.
Use the Schedules popover in the Agent Assets toolbar to enable automatic report emails (Daily / Weekly / Monthly). Scheduled reports are generated and emailed as PDF attachments to all admin users who have report_ready notifications enabled. Changing the report schedule also sets the same scan schedule on all agents so data stays fresh.
Scheduled audits
You can configure automatic recurring audits per agent. Open the Security Audit modal and use the schedule selector in the top-right corner to choose a frequency:
- Manual only — No automatic scans (default).
- Daily — Runs once every 24 hours.
- Weekly — Runs once every 7 days.
- Monthly — Runs once every 30 days.
The scheduler checks every 15 minutes and triggers audits for agents that are overdue. Agents that have never been scanned are prioritized. A yellow badge (D, W, or M) appears on the agent card to indicate an active schedule.
Constraints
- Only one audit can run per agent at a time.
- Each agent stores its latest audit result. Previous results are archived to history (max one per day, configurable retention via
AUDIT_HISTORY_RETENTION_DAYS). - The agent must be online to start an audit (manual or scheduled).
- The
agentspermission is required to start audits, trigger remediation, and change the schedule. All authenticated users can view results.
Service Monitors
Monitor the availability and response time of services running on your agents. Monitors run directly from the agent's network, so they can check internal services (localhost, LAN) as well as public endpoints.
How it works
- Create — Open the Monitors page and click Add Monitor. Pick a service type from the catalog (43 types across 9 categories), select an agent, and configure the check parameters.
- Check — The agent runs the check locally on the configured schedule (1m, 5m, 15m, 30m, or 1h). Five check types are supported: TCP connect, HTTP request, DNS resolution, ICMP ping, and TLS certificate expiry.
- Report — The agent reports results to the portal. Only status transitions (up→down, down→up) and periodic summaries are sent — not every individual check. This keeps DB writes near zero when everything is up.
- Alert — When alerts are enabled, an email is sent to all users assigned to the target agent after a configurable number of consecutive failures (default: 3). A recovery email is sent when the service comes back up.
Service catalog
The monitor catalog (one JSON file per service in monitors/) defines 43 service types organized in 9 categories:
| Category | Services |
|---|---|
| Web | HTTP / HTTPS, REST API, HAProxy, Squid Proxy |
| Network | TCP Port, Ping (ICMP), DNS, NTP |
| SMTP, IMAP, POP3 | |
| Database | MySQL / MariaDB, PostgreSQL, SQL Server, Redis / Valkey, MongoDB, Elasticsearch, Memcached, ClickHouse, InfluxDB, Cassandra, CouchDB |
| Messaging | RabbitMQ, Kafka, NATS, MQTT |
| File Sharing | FTP / SFTP, SMB / CIFS, NFS, AFP, MinIO / S3, WebDAV |
| Remote Access | SSH, RDP, WinRM, OpenVPN, IPsec / IKEv2 |
| Infrastructure | LDAP / LDAPS, Kerberos, Docker API, Consul, Vault, etcd |
| Monitoring | Prometheus, Grafana, Zabbix |
Each service type maps to one of 5 agent check types (TCP, UDP, HTTP, DNS, Ping). TCP and HTTP checks support an SSL/TLS toggle for TLS handshake validation and optional certificate expiry warnings (works with self-signed certificates). Adding a new service is just a JSON file in the monitors/ directory — no code changes needed.
Alerts
Each monitor has an alert toggle and a configurable consecutive failure threshold (default: 3).
- Down alert — Sent when the monitor reaches the failure threshold. Emails all users assigned to the target agent (direct access + group access + admins/owners). In-app notification and webhook (
monitor.down) also fired. - Recovery alert — Sent when the monitor comes back up after being down. Same recipients. Webhook:
monitor.up. - Manual refresh — The Refresh button triggers immediate checks on all monitors but does not fire alerts (prevents false alerts from manual testing).
Test before creating
The Test button in the create/edit modal sends an ad-hoc check to the agent and shows the result immediately (up/down, response time, error) without creating or saving the monitor.
Data & charts
- Response time sparkline — Each monitor in the list shows a mini chart of recent response times (from hourly rollup data).
- Detail modal — Click a monitor to see uptime percentages (24h, 7d, 30d), a full response time chart, and the status change timeline.
- Infrastructure badges — Agent cards in the Agent Assets page show a monitor status badge (e.g. “3/3 up” or “1 down”).
Permissions
- All authenticated users can view monitors, statuses, charts, and history.
- The
perm_monitorspermission (or admin/owner role) is required to create, edit, delete monitors, and toggle alerts. - The permission toggle appears in Users & Roles under “Admin permissions”.
MCP integration
Two MCP tools are available for AI assistants:
list_monitors— List all monitors with their current status, response time, agent, and schedule.search_monitors— Filter monitors by status (down), service type (mysql), agent name, or free text.
Per-Plan Limits
The number of monitors per account is limited by your plan (Free: 20, Pro: 100, Business: 200, Enterprise: unlimited). The Monitors page shows your usage against the limit. The Add Monitor button is disabled when the limit is reached.
Certificates & PKI
Manage TLS certificates for your agents directly from the portal. Two certificate sources are supported:
- Internal CA — Create or import an RSA-4096 Certificate Authority. Issue leaf certificates (ECDSA P-256, RSA-2048, or RSA-4096) signed by your CA. A CRL is automatically generated and served at a public URL.
- Let's Encrypt — Register an ACME account and issue free, publicly-trusted certificates. Two challenge types are available:
- HTTP-01 (default) — The agent handles the challenge automatically on port 80. Requirement: the agent must be reachable on inbound TCP port 80 from the internet.
- DNS-01 — The portal creates a DNS TXT record via your configured DNS provider. Works with any agent (public or private) and supports wildcard certificates (
*.example.com). Configure DNS providers in Settings → PKI & CA → DNS-01 Providers. Supported providers: Cloudflare, DigitalOcean, Hetzner DNS, OVH.
Setup
- Configure a CA or LE account — Go to Settings → PKI & CA. Create a new internal CA, import an existing sub-CA, or register a Let's Encrypt account. Optionally add DNS-01 providers for DNS-based certificate validation.
- Set defaults — Configure default certificate validity (14–365 days), key type (ECDSA P-256, RSA-2048, RSA-4096), and renewal window (7–90 days before expiry).
- Issue certificates — Go to Certificates, click New Certificate, pick a target agent, and fill in the common name, file paths, and optional SANs.
Certificate Lifecycle
- Issue — The agent generates a keypair and CSR locally — the private key never leaves the agent. The portal validates the CSR, signs it with the internal CA (or submits it to Let's Encrypt via ACME), and sends only the signed certificate back to the agent over WebSocket. The agent writes the cert and key to disk and reloads the target service.
- Renew — Manual via the Renew button, or automatic via the daily renewal sweep. Renewal generates a fresh keypair and CSR on the agent, signs a new certificate, deploys it, then revokes the old one. For LE certs, the old certificate is also revoked on Let's Encrypt's side.
- Revoke — Marks the certificate as revoked and updates the CRL. For LE certs, the revocation is also sent to Let's Encrypt's ACME endpoint. Revoked internal CA certs can be reactivated; LE revocations are permanent.
- Delete — Soft-deletes the certificate (must be revoked, expired, or failed first). The serial stays in the CRL until the certificate's natural expiry date, then is purged by the daily sweep.
CRL & Public Endpoints
The portal serves two public endpoints (no authentication required):
/pki/<crl_id>.crl— DER-encoded CRL signed by the internal CA. Cached in Redis with 7-day validity./pki/<crl_id>.cer— DER-encoded CA certificate for trust chain installation.
Both URLs are embedded in issued certificates as the CRL Distribution Point and Authority Information Access extensions.
Auto-Renewal Sweep
A daily background task (Redis-locked for HA) handles certificate lifecycle:
- Expires certificates whose
not_afterhas passed. - Renews active certificates that are within the renewal window and whose agent is online.
- Purges soft-deleted certificates whose serial has naturally expired, and stale failed/pending rows older than 7 days.
- Sends notifications (in-app, email, webhook) on renewal success or failure when alerts are enabled.
Permissions
- CA and LE account management is restricted to the account owner.
- The
perm_certificatespermission (or admin/owner role) is required to issue, renew, revoke, and delete certificates. - All authenticated users can view certificates, their status, and deployment details.
Per-Plan Limits
The number of certificates per account is limited by your plan (Free: 10, Pro: 50, Business: 100, Enterprise: unlimited). The Certificates page shows your usage against the limit. The New Certificate button is disabled when the limit is reached.
MCP Tools
list_certificates— List all certificates with status, agent, expiry, and key type.search_certificates— Filter certificates by status, source, agent name, or free text.
System Backups
End-to-end encrypted filesystem backups from your agents to your own S3 storage. ManageLM never sees your data — the agent encrypts every archive locally before uploading, and only the ciphertext transits via your S3 bucket. Restore to any online agent at any time.
Providers
The S3 bucket is configured once per account in Settings → S3 Backups. Provider-agnostic — one set of credentials, any S3-compatible storage:
- OVH Object Storage (recommended for EU-based data residency)
- Amazon S3
- Cloudflare R2 (no egress fees)
- Backblaze B2
- Wasabi
- Scaleway Object Storage
- MinIO (self-hosted S3)
Each provider has a one-click preset that pre-fills the endpoint URL. The Test button validates credentials via HeadBucket before saving. Secret keys are stored AES-256-GCM encrypted at rest.
Encryption
Every backup has its own randomly generated 32-byte master key, stored wrapped server-side. Before each run, the portal sends the key to the agent over the existing mTLS WebSocket channel — never over HTTP, never logged.
- AES-256-CBC + HMAC-SHA256 — encrypt-then-MAC pattern, industry standard.
- Domain-separated subkeys — encryption key =
SHA256(master || "enc"), MAC key =SHA256(master || "mac"). - Wire format —
IV (16 bytes) || ciphertext || HMAC tag (32 bytes). - Restore-time verification — the HMAC is checked at end-of-file using a rolling-window stream; mismatch destroys the download stream so the browser sees a broken file instead of partial/corrupt bytes.
Pure-Python implementation on the agent via oscrypto — no cryptography package, no native build dependencies.
Schedule & Retention
Each backup has its own cadence and retention:
| Schedule | Configurable Fields |
|---|---|
| Every hour | — |
| Every 6 hours | — |
| Daily | Run time (HH:MM, agent-local) |
| Weekly | Day of week + run time |
| Monthly | Day of month (1–31, clamped) + run time |
FIFO retention — specify how many snapshots to keep (1–90). Older snapshots are automatically rotated out by the cleanup cron, which best-effort deletes the S3 object then the DB row.
Quiesce services during backup
For a consistent snapshot of databases and stateful apps, list one or more services to stop during the backup (comma-separated, e.g. postgresql, redis). The agent:
- Stops each listed service via
systemctl stop(Linux) ornet stop(Windows) — 30-second timeout per service. - Runs the tar → encrypt → upload pipeline.
- Restarts every service that was successfully stopped — in a
try/finallyso a backup failure (or the agent being killed mid-run) never leaves services down.
Run flow
- Agent requests a presigned
PUTURL from the portal; portal pre-inserts apendingsnapshot row. - Agent tars the source path (with optional excludes), encrypts the archive, uploads directly to S3 — never through the portal.
- Agent reports size, file count, duration, SHA-256 via
backup_status. - Portal flips the snapshot to
ok/failed; the cleanup cron reaps stuckpendingrows after 6 hours.
Download & Restore
- Download decrypted
.tar.gz— mints a one-shot Redis token (5 min TTL) then triggers a native browser download. The portal streams S3 bytes through a rolling-window HMAC/decrypt pipeline; the save dialog opens immediately and the progress bar fills as the archive arrives. - Restore to any agent — pick a target agent and target path in the Restore modal. The agent downloads, verifies the HMAC before touching disk, strips the top-level source directory, and extracts with a path-traversal guard. You can restore to the original agent or any other online agent — the backup is portable.
- Abort a running backup — cancel a stuck upload and reclaim the snapshot slot.
Detach on agent delete
When you delete an agent that has backups, the backups are not deleted — their agent_id is cleared instead. The S3 data and snapshot history survive the hardware replacement. A purple Reassign button appears in the backup row; clicking it opens the edit modal with an Agent picker so you can attach the backup to a new agent and continue the schedule. The UI also warns you about the detached count before confirming the agent deletion.
S3 orphan cleanup
The S3 Cleanup button in Settings → S3 Backups scans your bucket under the account prefix and deletes objects that have no matching snapshot row in the portal. Useful when the bucket was deleted externally, credentials were rotated mid-run, or you want to reclaim storage after manually removing backups.
Alerting
Per-backup toggle for alert-on-failure emails. ManageLM also detects stalled backups: if a scheduled backup is missed because its agent is offline, you receive a single consolidated alert per agent rather than one alert per missed run.
Permissions
- Read access — all authenticated users can view the Backups page, snapshot counts, and last-run status.
- Backups Admin (
perm_backups) — required to create, edit, delete, run-now, reassign detached backups, download decrypted snapshots, restore, and configure account S3 settings. - Owners and admins bypass the permission check.
Per-Plan Limits
The number of backups per account is limited by your plan (Free: 20, Pro: 100, Business: 200, Enterprise: unlimited). Detached backups still occupy a slot — delete them explicitly to free the slot.
Constraints
- Each snapshot is held fully in memory during encrypt/decrypt on the agent (oscrypto one-shot API). Practical upper bound: 4 GB archive size. For larger datasets, split into smaller backups.
- You must configure S3 storage before creating your first backup.
- Restoring to a detached backup requires explicitly picking a target agent.
Pentests
ManageLM includes automated penetration testing for your public-facing agents. Pentests scan your servers from the outside — testing what an attacker would see. Available on Pro and Business plans.
How it works
- Select — Open the Pentests page and click New Pentest. Choose one or more public agents, select the tests to run, and optionally add target URLs.
- Validate — The portal sends a one-time token to the agent. The agent validates with the pentest service from its public IP, proving it controls the target.
- Scan — The pentest service runs tools sequentially: nmap (port discovery), nuclei (vulnerability scanning), testssl.sh (TLS audit), and more depending on selected tests.
- Report — An LLM generates a human-readable report with findings, severity ratings, and a security score (0–100). Results appear in the Agent Assets audit modal (Pentest tab) and the Pentests dashboard.
Available tests
| Test | What it scans | Credits |
|---|---|---|
| Basic Scan | Port discovery (nmap), vulnerability scan (nuclei), TLS quick check (testssl) | 3 |
| Full Port Scan | All 65,535 TCP ports | 3 |
| Vulnerability Scan | Extended nuclei templates (critical/high/medium) | 3 |
| SSL/TLS Audit | Full testssl.sh analysis (per URL) | 1 |
| Web App Scan | Nuclei web templates (per URL) | 3 |
| DNS Audit | SPF, DMARC, DKIM, MX records (per URL) | 1 |
| HTTP Headers | Security headers analysis (per URL) | 1 |
| Directory Scan | Common path discovery with ffuf (per URL) | 2 |
| Subdomain Enum | Subdomain discovery with subfinder (per URL) | 1 |
URL-based tests run once per target URL. Credit cost is calculated as: IP-based test credits + (URL-based test credits × number of URLs).
Credits
Pentests consume credits. Credits are deducted after a successful scan — failed scans are not charged.
- Bundled credits — Pro and Business plans include credits when you first subscribe.
- Purchase more — Click Add Credits in the Pentests page or Settings > Account to buy additional credit packs.
- Balance — Your remaining credits are shown in the Pentests dashboard and in Settings > Account.
Domain verification
Before scanning URLs, you must verify domain ownership. The pentest service generates a DNS TXT record that you add to your domain. Once verified, the domain stays valid for 24 hours before requiring re-verification.
Compliance integration
Pentest results automatically feed into the Compliance page. Each tool produces a pass/fail rule that maps to framework controls (CIS, PCI-DSS, SOC 2, ISO 27001, NIS2, NIST CSF, HIPAA). Pentest rules appear alongside security audit rules in framework coverage views.
Constraints
- Only public agents (internet-facing) can be pentested — the service scans from the outside.
- One pentest per agent at a time.
- Target URLs must DNS-resolve to the agent's public IP.
- The agent must be online to validate the scan token.
- Requires a Pro or Business plan with sufficient credits.
Compliance & Frameworks
The Compliance page maps your security audit results to industry compliance frameworks. ManageLM evaluates your fleet against each framework's controls and shows which pass, fail, or are not covered by the current rule set.
Supported frameworks
| Framework | Version | Description |
|---|---|---|
| CIS Level 1 | v8.0 | Center for Internet Security — essential security hygiene for servers |
| CIS Docker | v1.6 | CIS Docker Benchmark — container runtime security |
| SOC 2 | 2017 | Trust Services Criteria — Security principle technical controls |
| PCI DSS | v4.0 | Payment Card Industry Data Security Standard |
| ISO 27001 | 2022 | ISO/IEC 27001 Annex A — information security controls |
| NIS2 Directive | 2022 | EU Directive 2022/2555 — network and information security measures |
| NIST CSF | v2.0 | NIST Cybersecurity Framework — Protect, Detect, Identify functions |
| HIPAA Security Rule | 2013 | 45 CFR §164.312 — technical safeguards for protected health information |
How controls are evaluated
Each framework control is backed by one or more checks from security audits, pentests, and vulnerability scans. A control passes only when every backing check passes on every agent. If any check fails on any agent, the control fails. Controls with no data yet (no agents scanned) show as not covered.
Compliance dashboard
The Compliance page has two tabs:
Agents tab
- Fleet score — Average compliance score across all agents, with trend charts.
- Issue breakdown — Critical, high, medium, low counts with stacked area chart over time.
- Drift detection — Rules that changed from pass to fail between scans (shown as an alert at the top).
- Per-agent detail — Expand any agent row to see score history, rule results by category, raw check output, and a history slider to compare past states.
- Re-scan — Trigger audits per agent or fleet-wide with the Scan All button (requires reports permission).
Frameworks tab
- Framework list — All frameworks with compliance percentage, progress bar, and icon badge.
- Expand a framework — Shows each control with pass/fail status, reference number, description, and website link.
- Expand a control — Shows the underlying technical checks with fleet-wide pass/fail counts.
- Evidence PDF — Download button on each framework (enabled at ≥ 50% compliance).
Security drift notifications
When a security audit completes and a rule that previously passed now fails, ManageLM detects this as drift. Drift is shown in the Compliance dashboard as an alert. Optionally, admins can enable the Security Drift email notification in Settings > Email Notifications to receive an email with the new issues.
Drift detection only triggers when there is audit history — the first scan for an agent never generates drift alerts.
Evidence PDF export
Each framework has an Evidence PDF button (enabled when compliance is ≥ 50%). The generated PDF is designed for auditors and includes:
- Cover page — Framework name, version, compliance percentage, agent count.
- Scope & methodology — Assessment date, server count, evaluation method, reference URL.
- Control evidence — Each control as a card with status badge, description, technical checks table (pass/fail per check with fleet-wide counts), and for failing controls: per-agent findings with remediation guidance and raw command output as evidence.
- Assessed infrastructure — Table of all servers with compliance score, exposure level (public/private), and last audit date.
- Disclaimer — Technical controls only, not a compliance certification.
The fleet-wide Export PDF button on the Compliance page generates a summary report covering all frameworks.
Adding custom frameworks
Frameworks are defined as JSON files in the frameworks/ directory. Each file specifies an id, name, version, description, url, and an array of controls. Each control maps to existing rule_slugs. No code changes are needed — drop a new JSON file and restart the portal.
System Inventory
ManageLM discovers all running services, installed packages, and system components on your agents. Checks are read-only and no skill assignment is required.
How it works
- Trigger — Open an agent's detail panel on the Agent Assets page. Click the clipboard icon to open the System Inventory modal, then click Run Inventory.
- Scan — The agent collects information about the system using a read-only set of checks.
- Structure — The agent's configured LLM categorizes the results and extracts product names and versions. This is the only built-in report that uses the LLM. Without an LLM, a minimal inventory is returned.
- Results — Inventory items appear in the modal, grouped by category.
What is collected
| Check | What it inspects |
|---|---|
| System Info | OS, kernel, uptime, CPU count, memory, disk usage |
| Running Services | All active services (systemd on Linux, Windows Services on Windows) |
| Enabled Services | Services enabled at boot |
| Listening Ports | TCP listening sockets with associated processes |
| Installed Packages | Package list from rpm or dpkg (Linux), or installed programs list (Windows) |
| Package Versions | Explicit version extraction for common packages (nginx, PostgreSQL, Redis, Docker, etc.) |
| Containers | Docker/Podman containers with image, status, and ports |
| Cron Jobs | System and per-user cron jobs |
| Network Interfaces | All network interfaces with addresses |
| Mounted Filesystems | Non-virtual mounted filesystems |
| Hardware Info | CPU model, memory, disks |
| Web Servers | Running web servers (nginx, Apache, Caddy, HAProxy) |
| Databases | Running databases (PostgreSQL, MySQL, MongoDB, Redis, Valkey, Memcached, Elasticsearch) |
| Login Users | Non-system user accounts with shell and group membership |
Categories
Each inventory item is classified into one of these categories:
| Category | Examples |
|---|---|
system | OS version, kernel, CPU, memory, disk |
web | Nginx, Apache, Caddy, HAProxy |
database | PostgreSQL, MySQL, Redis, Valkey, MongoDB, Elasticsearch |
mail | Postfix, Dovecot, OpenDKIM |
container | Docker containers, Podman containers |
network | Network interfaces, listening ports |
storage | Mounted filesystems, disks |
security | Fail2ban, SELinux, firewall |
monitoring | Monitoring agents, metrics collectors |
log | Rsyslog, journald, logrotate |
user | Login user accounts |
scheduler | Cron jobs, systemd timers |
PDF export
Click the Inventory button at the top of the Agent Assets page to download a fleet-wide inventory report covering all agents with completed inventories. The PDF includes categorized service lists with versions and status for each server.
Like security reports, use the Schedules popover to enable automatic email delivery. Changing the schedule also syncs all agents' inventory scan schedule.
Scheduled inventories
You can configure automatic recurring inventories per agent. Open the System Inventory modal and use the schedule selector in the top-right corner to choose a frequency:
- Manual only — No automatic scans (default).
- Daily — Runs once every 24 hours.
- Weekly — Runs once every 7 days.
- Monthly — Runs once every 30 days.
The scheduler checks every 15 minutes and triggers inventories for agents that are overdue. A yellow badge (D, W, or M) appears on the agent card to indicate an active schedule.
Constraints
- Only one inventory can run per agent at a time.
- Each agent stores only its latest inventory result (previous results are overwritten).
- The agent must be online to start an inventory (manual or scheduled).
- The
agentspermission is required to start inventories and change the schedule. All authenticated users can view results.
SSH & Sudo Access
ManageLM includes a built-in access scanner that discovers SSH authorized keys and sudo privileges across your infrastructure. Check commands are defined on the portal (reports/ssh_keys.json) and executed on the agent in a read-only sandbox. Fully deterministic — no LLM involved. Discovered SSH key fingerprints are matched against ManageLM user profiles for identity resolution.
How it works
- Trigger — Open an agent's detail panel on the Agent Assets page. Click the SSH & Sudo button to open the access scan modal, then click Scan Access.
-
Collect — The portal sends a
ssh_keys_scan_requestto the agent over WebSocket. The agent reads/etc/passwd, parses~/.ssh/authorized_keysfor each user, computes SHA256 fingerprints, and parses/etc/sudoers+/etc/sudoers.d/*with group membership resolution. No LLM calls. - Results — The combined data is returned to the portal and displayed in the modal. SSH key fingerprints are matched against public keys registered in ManageLM user profiles (Settings → Security → SSH Public Keys) — matched keys show the user's name in a green badge, unmatched keys show as "Unknown".
What is collected
| Data | Source | Details |
|---|---|---|
| SSH authorized keys | ~/.ssh/authorized_keys | Key type, SHA256 fingerprint, comment, full public key, line number |
| Sudo user rules | /etc/sudoers | Target host, runas user, commands, NOPASSWD flag, source file |
| Sudo group rules | /etc/sudoers + /etc/group | Group rules (e.g. %wheel) expanded to individual users via group membership |
Identity mapping
ManageLM users can register their SSH public keys in Settings → Security → SSH Public Keys. When the access scan discovers a key on a server, its SHA256 fingerprint is matched against registered keys to identify the owner. This creates a complete map of who has access to what and what they can do (SSH + sudo).
- Green badge — Key matched to a ManageLM user profile.
- Gray "Unknown" badge — Key not registered by any ManageLM user.
Sudo rules with NOPASSWD are highlighted in red as a security concern.
user@host comment in authorized_keys is unreliable — identity is resolved exclusively via SHA256 fingerprint matching against registered profiles.
MCP integration
The access scan powers natural-language access management via Claude:
search_ssh_keys— Search SSH keys across your infrastructure. Combines two data sources: registered keys from ManageLM user profiles (with full public key content) and deployed keys found by access scans on servers. Examples: "Who has SSH access to pocmail?", "Get Charly's SSH key", "List unknown SSH keys".search_sudo_rules— Search sudo privileges from access scan results. Examples: "Show me Charly's sudo authorizations", "List all NOPASSWD sudo rules on production".list_team_members— List ManageLM users with their roles, permissions, and registered SSH public keys. Used to look up a team member's key before granting access.run_access_scan— Trigger a fresh scan and wait for results.- Combined with the
usersskill (ssh_key+sudooperations): "Give Charly SSH access to pocmail", "Add authorization for Charly to reboot all production servers", "Remove all access for Yoann". Claude automatically looks up the team member's registered SSH key viasearch_ssh_keysbefore dispatching the task.
PDF export
Click the SSH & Sudo button at the top of the Agent Assets page to download a fleet-wide access report. The PDF includes SSH keys and sudo rules per user per server, with NOPASSWD rules highlighted.
Scheduled scans
Configure automatic recurring scans per agent via the schedule selector in the modal header, or for all agents via the Schedules popover in the Agent Assets toolbar. Frequencies: Manual / Daily / Weekly / Monthly.
Constraints
- Only one scan can run per agent at a time.
- Each agent stores only its latest scan result (previous results are overwritten).
- The agent must be online to start a scan.
- The
reportspermission is required to start scans and change schedules. All authenticated users can view results.
Activity Audit
ManageLM includes a built-in activity audit that tracks user activity on your servers. Check commands are defined on the portal (reports/activity.json) with time-window parameters resolved per scan. Executed on the agent in a read-only sandbox. On Linux, no auditd dependency — works on any distribution. On Windows, the audit uses Windows Event Log. Fully deterministic, no LLM needed.
How it works
- Trigger — Click the Activity tab on an agent card in the Agent Assets page, then click Run Activity Audit.
- Scan — The agent collects activity for the configured time window.
- Parse — Events are normalized, deduplicated, and system accounts are filtered out.
- Identity — Full names (including LDAP/SSSD users) are matched against ManageLM users — matched users appear as green badges.
- Results — Displayed in the Activity Audit modal with dashboard cards and detail tables.
What the report shows
- Login Success — Successful SSH/console logins with user, timestamp, and source IP (from
wtmp). - Login Failed — Failed login attempts with user, timestamp, and source IP (from
btmp). - Sudo / Elevated Commands — Linux: all commands run via
sudo(always shows the real user, even aftersudo su -; supports compressed rotated log files). Windows: elevated PowerShell sessions from Windows Event Log. - Files Changed — Config files modified under
/etcand/var/spool/cron, detected by modification time. Common system noise files are filtered. - Package Changes — Packages installed, updated, or removed. Linux: from
rpm --last(RHEL/CentOS) ordpkglogs (Debian/Ubuntu). Windows: from Windows Event Log. - Service Changes — Services started, stopped, or failed (from systemd journal on Linux, Windows Event Log on Windows).
- Reboots — System reboot events with kernel version.
Time windows
Each audit collects data for a rolling time window:
- Manual runs — last 24 hours
- Daily schedule — last 24 hours
- Weekly schedule — last 7 days
- Monthly schedule — last 30 days
PDF export & scheduled reports
Click the Activity button at the top of the Agent Assets page to download a fleet-wide activity audit report as PDF. Use the Schedules popover to configure automatic report emails.
Constraints
- Only one audit can run per agent at a time.
- Each agent stores only its latest audit result.
- The agent must be online.
- The
reportspermission is required to start audits and change schedules.
Service Dependencies
The Service Dependencies scan discovers cross-server service dependencies across your infrastructure. It shows what each server provides, what it depends on, and highlights connections between managed agents.
How it works
- Trigger — Click the Service Dependencies button at the top of the Agent Assets page.
-
Scan — The portal sends a
dependency_scan_requestto all online agents simultaneously. A progress modal shows each agent's scan status in real time. -
Collect — Each agent runs a fully deterministic scan (no LLM needed):
- Provides — discovers all listening TCP services via
ss. - Depends on — discovers outbound connections (established TCP) plus config-file parsing for intermittent dependencies.
- All hostnames are resolved to IPs locally on the agent before reporting.
- Provides — discovers all listening TCP services via
- Report — The portal matches dependency IPs against known agent IPs to identify managed vs external connections, and displays a per-agent report.
What is scanned
| Source | What it finds |
|---|---|
| Established connections | All active outbound TCP connections to non-local IPs |
| Nginx configs | proxy_pass, upstreams, fastcgi_pass, uwsgi_pass, grpc_pass |
| Apache configs | ProxyPass, ProxyPassReverse, RewriteRule [P] |
| HAProxy config | Backend server definitions |
| Caddy config | reverse_proxy targets |
| .env files | DATABASE_URL, REDIS_URL, DB_HOST, SMTP_HOST, and many more |
| Docker Compose | Environment variables with connection strings |
| WordPress | DB_HOST in wp-config.php |
| Database replication | MySQL master-host, PostgreSQL primary_conninfo, Redis replicaof |
| Mail configs | Postfix relayhost and lookup tables, Dovecot auth backends |
| LDAP configs | ldap.conf, sssd.conf, nslcd.conf URI/host directives |
| NFS/CIFS mounts | Network mounts in /etc/fstab |
| Systemd units | Environment variables with connection strings in service files |
| Prometheus | Scrape targets in prometheus.yml |
| DNS resolvers | /etc/resolv.conf nameservers |
| NTP servers | ntp.conf, chrony.conf, timesyncd.conf |
| Syslog targets | Remote syslog destinations in rsyslog configs |
| SNMP traps | Trap sink destinations in snmpd.conf |
| Backup clients | Bacula, Bareos, Borg, Restic server addresses |
| Zabbix agent | Server= directive in zabbix_agentd.conf |
| Generic /etc sweep | URLs with host:port and raw IP:port patterns across all /etc files |
Report format
Each agent's section shows:
- Provides — green badges for each listening service and port.
- Depends on — each dependency with a managed badge (connection to another managed agent) or external badge (connection to an unmanaged server).
- Used by — which other managed agents connect to this server.
Constraints
- All agents must be online to participate in the scan.
- No database storage — results are computed on demand and held temporarily in Redis (2 minutes).
- The
reportspermission is required to run a dependency scan. - The scan is fully deterministic — no LLM is used.
Connectors
Connectors wire ManageLM up to external systems. They come in two kinds, selectable as tabs in the Add Connector modal:
- Cloud Hosting — pull infrastructure inventory (VMs, volumes, networks, security groups) from a cloud provider and auto-match the VMs to your ManageLM agents.
- SIEM Integration — push task-completion events from your agents out to an external SIEM (Splunk, Elasticsearch, or a generic JSON webhook).
Both kinds share the same permission (perm_connectors), the same encryption-at-rest (AES-256-GCM, requires ENCRYPTION_KEY), the same storage table, and the same CRUD pages. What differs is the data flow: cloud connectors pull on a schedule, SIEM connectors push as tasks complete.
Cloud Hosting
Sync your cloud resources (VMs, volumes, networks, security groups) and auto-match them to ManageLM agents by IP address and hostname.
Supported providers
- Microsoft Azure — Service principal auth (tenant ID, client ID, client secret, subscription ID)
- Amazon AWS — Access key auth (access key ID, secret access key, region)
- Google Cloud — Service account auth (project ID, service account JSON key)
- VMware vSphere — Session auth (vCenter URL, username, password). Requires vSphere 7.0+
- Proxmox VE — API token auth (API URL, token ID, token secret)
- OpenStack — Keystone v3 password auth (auth URL, username, password, project, domain, region). Defaults to OVH endpoint
How it works
- Go to Connectors in the sidebar and click Add Connector.
- On the Cloud Hosting tab, select a provider, enter a name, and fill in your credentials.
- Click Save — the connector syncs automatically on creation.
- Use Edit → Test Connection to verify credentials at any time.
- Cloud resources appear in the connector's expanded view and on agent cards in the Agent Assets page.
What is synced
- VMs / Instances — name, status, IPs, instance type, availability zone, security groups, tags
- Volumes / Disks — name, size, type, encryption, attachment
- Networks — VPCs, subnets, CIDRs
- Security Groups / Firewalls — rules with direction, protocol, ports, source/destination
Agent matching
After each sync, ManageLM automatically matches cloud VMs to agents by comparing IP addresses and hostnames. Matched agents show a provider badge (e.g. AWS, Azure) on their card in the Agent Assets page. Expanding an agent card shows the full cloud metadata (instance type, zone, IPs, disks, security groups, tags).
MCP integration
Claude can query your cloud inventory using three built-in tools:
list_connectors— list configured cloud connectors with sync status and resource countssearch_cloud— search VMs, volumes, networks, security groups across all providersget_cloud_info— detailed info for a single cloud resource with linked agent data
These tools are hidden until at least one cloud connector exists — a SIEM-only tenant will not see them in Claude's tool catalog.
Sync schedule
Each connector syncs on a configurable interval: every 1 hour, 6 hours, 12 hours, or 24 hours. Manual sync is available from the connector list (refresh icon). Syncs are distributed across portal instances using Redis locks to prevent duplicates.
Security
- Connector credentials are AES-256-GCM encrypted at rest (requires
ENCRYPTION_KEYenv var). - Non-secret fields (region, project ID, URLs) are stored separately and visible when editing.
- All cloud API calls have 30-second timeouts and SSRF protection (private IP ranges blocked).
- Error messages are sanitized to prevent credential leakage in logs or UI.
SIEM Integration
Forward task-completion events from your agents directly to an external SIEM. Useful for compliance, centralized security monitoring, and audit trails outside ManageLM's own database. Forwarding is additive — the portal's own task log and audit trail are unchanged.
Supported destinations
- Splunk HEC — HTTP Event Collector. Needs the HEC URL and a token. Optional Splunk
indexandsourcetype. Pasted tokens may include aSplunkprefix — ManageLM strips it automatically. - Elasticsearch (
_bulk) — needs the cluster URL, an index name, and a base64-encoded API key. TheApiKeyprefix is stripped if pasted in. - Generic JSON Webhook — POSTs the event envelope as JSON to any URL. Optional raw
Authorizationheader (e.g.Bearer <token>) and optional HMAC-SHA256 secret (agent signs the raw body; digest sent inX-ManageLM-Signature: sha256=<hex>so you can verify integrity on the receiving end).
What gets forwarded
One event per completed task — the same rows you see in the Command History panel of an Agent's detail page. Nothing else is forwarded: no heartbeats, no config pushes, no LLM traffic.
{
"ts": "2026-04-17T14:23:11Z",
"agent": { "hostname": "prod-web-01" },
"task": {
"id": "...",
"skill": "firewall",
"instruction": "block 1.2.3.4",
"status": "completed",
"output": "...",
"error": null,
"files_changed": ["/etc/nftables.conf"]
}
}
Splunk wraps this in {"event": <envelope>, "sourcetype": "...", "index": "...", "host": "..."}. Elasticsearch sends it as an NDJSON _bulk body (action line + doc line). Webhook sends a JSON array of envelopes per batch.
How it works
- Go to Connectors in the sidebar and click Add Connector.
- Switch to the SIEM Integration tab, pick a type, enter a name, fill in the endpoint and credentials, and save. A Test Connection runs automatically on create.
- Open an Agent detail page — or a Server Group — and pick the new SIEM from the SIEM Forwarding dropdown.
- From that point on, every task completed by that agent fires a POST to the SIEM, in parallel with the normal task-result report to the portal.
Assignment and inheritance
Each agent has at most one SIEM destination. It resolves as:
- If the agent itself has a direct override → that destination wins.
- Else if its group(s) point at a single destination → inherit that one.
- Else → no forwarding.
If an agent belongs to several groups whose SIEM settings differ, the portal refuses to guess — the agent gets a red SIEM CONFLICT badge on the Agent Assets list until you set an explicit per-agent override to resolve the conflict.
Agent groups show their SIEM destination as a small → <connector name> pill on the group card (read-only view).
Transport and reliability
- Fire-and-forget. The agent enqueues the event into a bounded in-memory queue (512 events) and a background worker POSTs it. Task execution never blocks on SIEM delivery.
- Circuit breaker. After 5 consecutive failures the agent backs off for 60s before retrying — a down SIEM does not get hammered.
- No persistence. If the agent is restarted while the queue has events, those events are dropped. The portal's own task log always has the same data — SIEM forwarding is a fan-out, not a queue-of-record.
- Config changes flush the queue. Reassigning an agent from Splunk A to Splunk B discards events still destined for A, preventing cross-tenant leakage.
Security
- SIEM tokens and HMAC secrets are AES-256-GCM encrypted at rest, decrypted only to build the per-agent config. They are never logged.
- The Test Connection button runs from the portal — private SIEMs behind NAT will therefore fail the portal-side test even if agents can reach them; the actual forwarding still works. (A future iteration may route the test through a chosen agent.)
- The generic webhook supports HMAC-SHA256 signing so your receiver can drop unsigned or tampered events.
- Because events travel agent → SIEM directly (not via the portal), customer event data never transits the ManageLM SaaS — relevant for data-residency and compliance requirements.
Permissions
Creating, editing, or deleting a SIEM connector requires the Connectors permission (perm_connectors) — the same gate as cloud connectors. Assigning a SIEM destination to an agent also requires the Agents permission; assigning one to a group requires the Groups permission.
Permissions (shared)
The Connectors permission (perm_connectors) covers both kinds. Owners and admins have full access. Members need the permission toggled on in Users & Roles.
Change Tracking
ManageLM automatically tracks file changes made by every mutating task. Each agent maintains a local git repository that snapshots tracked directories before and after task execution, producing a precise record of what changed, when, and by which task.
How it works
- Pre-snapshot — Before a task executes, the agent syncs all tracked files into its local git repo and commits a baseline.
- Task execution — The task runs normally (LLM-driven commands).
- Post-snapshot — After the task completes, the agent syncs again, commits the delta, and computes the list of changed files.
- Report — Changeset metadata (files changed, commit hashes, summary) is sent to the portal and stored in the database. The full diff stays in the agent’s local repo.
What is tracked
| Aspect | Detail |
|---|---|
| Tracked directories | /etc/ — covers SSH, nginx, firewall, cron, sudoers, sysctl, network config, and more |
| Skipped content | Binary files, files > 512 KB, symlinks, and noisy directories (ssl/certs, pki/ca-trust, firmware, kernel, selinux/targeted/policy) |
| Git implementation | dulwich (pure Python) — no git CLI needed on the host |
| Repo location | /opt/managelm/git/ on each agent |
| Retention | 30 days — older commits are automatically pruned daily |
Viewing changes
When a task modifies tracked files, a changeset badge appears on the task in the task log (in the Agent Detail page and the MCP Log). The badge shows the number of files changed.
With MCP (Claude), use the built-in get_task_changes tool to inspect what a task modified:
get_task_changes(task_id="...", full_diff=true)
This returns:
- List of changed file paths
- A summary (e.g. “Modified 3 files in /etc/nginx/, /etc/ssh/”)
- Optionally, the full unified diff (when
full_diff=true) — fetched on demand from the agent’s local repo
Reverting changes
If a task made unwanted changes, you can revert them to restore the previous file state. Use the revert_task MCP tool:
revert_task(task_id="...")
This fetches the diff from the agent’s local git repo and applies a reverse patch, restoring the files to their pre-task state. The revert is tracked as a separate changeset.
Non-mutating tasks
Tasks classified as read-only (non-mutating) by the LLM skip the snapshot process entirely — no changeset is created. This keeps the git history clean and avoids unnecessary I/O for read-only operations like status checks and log queries.
Audit Log
The Audit Log provides a chronological record of all administrative actions performed in your account. It is accessible from the Audit Log entry in the sidebar.
What is logged
Every significant action is automatically recorded, including:
| Category | Actions |
|---|---|
| Authentication | Login, logout |
| Users | Invite, update role/permissions, delete, transfer ownership |
| Agents | Approve, delete, update settings, bulk actions |
| Skills | Create, import, update, delete, document upload/delete |
| Groups | Create, update, delete, member changes |
| Webhooks | Create, update, delete |
| API Keys | Create, delete |
| MCP | Configuration changes (IP whitelist, etc.) |
| Account | Settings changes, license activation/removal |
Log entry details
Each entry records:
- Timestamp — When the action occurred (displayed in your configured timezone).
- User — Who performed the action (name and email).
- Action — The action type (e.g.
agent.approved,skill.created,user.login). - Target — The affected resource name (e.g. agent hostname, skill name, user email).
- IP address — The client IP address.
Access control
- Owners and admins see all log entries across the account.
- Members with the
logspermission see all entries. - Members without the
logspermission only see their own actions.
Features
- Search filter — Filter entries by user name, email, action type, target name, or IP address.
- Pagination — Results are paginated (50 per page) with navigation controls.
Reporting
The Reporting page provides a historical view of all task executions across your account. Use it to review what was run, by whom, on which agent, and what the outcome was.
Access
Reporting is visible to admin and owner roles by default. Members need the perm_reports permission enabled (configurable in Users & Roles).
Features
- Date range — pick a start and end date to narrow the report window (defaults to the last 30 days).
- Search filter — client-side text filter across user, agent, instruction, and summary fields.
- Expandable rows — click any entry to see full details: task ID, timestamps, the original request, and the agent-generated summary.
- Pagination — results are paginated (50 per page) with navigation controls.
- PDF export — click Export PDF to download a professional report covering the selected date range, including statistics and a task table.
Agent summaries
Each task entry may include a summary — a short description auto-generated by the agent’s LLM after completing the task. These summaries make it easy to scan results without reading raw command output.
Webhooks
Get notified when things happen in your account.
Available events
| Event | Fires when |
|---|---|
agent.enrolled | A new agent requests enrollment |
agent.approved | An agent is approved |
agent.online | An agent connects |
agent.offline | An agent disconnects |
task.completed | A task finishes successfully |
task.failed | A task fails |
report.completed | A security audit or inventory scan completes |
report.failed | A security audit or inventory scan fails |
monitor.down | A service monitor goes down (after consecutive failure threshold) |
monitor.up | A service monitor recovers from down |
cert.issued | A new certificate is issued and deployed to an agent |
cert.revoked | A certificate is revoked (CRL updated, LE notified for LE certs) |
cert.renewed | A certificate is automatically renewed by the daily sweep |
cert.renewal_failed | Automatic certificate renewal failed |
cert.reactivated | A revoked certificate is reactivated (internal CA only) |
cert.deleted | A certificate is soft-deleted from the portal |
Configure webhooks from Settings → MCP & API. Enter a URL, select events, and optionally provide an HMAC secret. Payloads are signed with HMAC-SHA256 via the X-Webhook-Signature header when a secret is configured.
Delivery retries up to 3 times with exponential backoff. After 10 consecutive failures, the webhook is automatically disabled. Re-enabling it resets the counter. Maximum 25 webhooks per account.
In-App Notifications
The portal includes a real-time notification system accessible from the Notifications bell in the sidebar. Notifications are delivered alongside email alerts for key events.
Notification triggers
- Agent enrolled — A new agent requests approval (account-wide, visible to all admins).
- Agent approved — An agent is approved (account-wide).
- Agent updated — An agent applied an auto-update (account-wide).
How it works
- An unread count badge appears on the notification bell when new notifications arrive.
- Click the bell to open the dropdown panel. All unread notifications are marked as read automatically.
- Notifications link to the relevant page (e.g. agent detail, request log).
- The bell polls for new notifications every 30 seconds.
- Use the clear button to remove read notifications from the list.
Deployment & .env
The portal is configured via environment variables in a .env file. Below is a reference of all available settings.
Core
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL | Yes | — | PostgreSQL connection string |
SERVER_PORT | No | 3000 | HTTP listen port |
SERVER_URL | Yes | — | Full public URL (e.g. https://portal.example.com) |
ACCESS_TOKEN_TTL | No | 86400 | Access token lifetime in seconds (24h). Tokens are opaque random strings stored in Redis — no signing secret. |
REFRESH_TOKEN_TTL | No | 2592000 | Refresh token lifetime in seconds (30d). |
DEFAULT_TIMEZONE | No | UTC | Default timezone for new users |
TASK_TIMEOUT_SECONDS | No | 300 | Max duration for synchronous task execution (seconds) |
FILE_TRANSFER_MAX_BYTES | No | 26214400 | Max file transfer size (default 25 MB) |
TOS_URL | No | — | URL to Terms of Service page. When set, signup forms require ToS acceptance. |
LOG_LEVEL | No | info | Log verbosity: trace, debug, info, warn, error, fatal, silent |
CLUSTER_WORKERS | No | 2 | Number of Node.js cluster workers. Set to 1 to disable clustering. |
SERVER_MODE | No | selfhosted | saas = hosted SaaS (trial LLM available), selfhosted = Docker/on-prem (proxied LLM available). |
NOTIFY_EMAIL | No | — | Email address for platform operator alerts (account created/deleted notifications). |
ENCRYPTION_KEY | No | — | AES-256 key for encrypting connector credentials at rest (cloud provider secrets and SIEM tokens). Required to use Connectors. Generate with: openssl rand -hex 32 |
SMTP & DKIM
| Variable | Required | Default | Description |
|---|---|---|---|
SMTP_HOST | No | — | SMTP server hostname. When empty, emails are sent directly to recipient MX servers (no mail server required). |
SMTP_PORT | No | 25 | SMTP server port |
SMTP_FROM | Yes | — | From address for all emails |
SMTP_SECURE | No | none | none = plain (localhost:25), starttls = upgrade via STARTTLS (587), tls = implicit TLS (465) |
SMTP_USER | No | — | SMTP auth username (for external relays) |
SMTP_PASS | No | — | SMTP auth password |
DKIM_DOMAIN | No | — | Domain for DKIM signing (e.g. example.com) |
DKIM_SELECTOR | No | default | DKIM selector (matches DNS TXT record) |
DKIM_PRIVATE_KEY_PATH | No | — | Path to PEM private key file |
DKIM_PRIVATE_KEY | No | — | Inline PEM private key (use \n for newlines) |
DKIM_DOMAIN and a private key are set, all outgoing emails are signed with DKIM (RSA-SHA256). You also need to publish a DNS TXT record at {selector}._domainkey.{domain} with the matching public key.
Redis (required)
| Variable | Required | Default | Description |
|---|---|---|---|
REDIS_URL | Yes | — | Redis connection URL (e.g. redis://localhost:6379). Supports redis://, rediss://, valkey://, valkeys:// schemes. |
REDIS_TLS | No | auto | auto = TLS if URL uses rediss:// or valkeys://, on = force TLS, off = no TLS |
REDIS_DB | No | 0 | Logical database number (0–15). Useful when sharing a Redis instance. |
Redis is a mandatory component used for:
- MCP session persistence — sessions survive portal restarts.
- Cross-instance messaging — pub/sub for cache invalidation, agent updates, and tool change notifications.
- In-app notifications — ephemeral per-user notification storage.
- Distributed locks — ensures background maintenance runs on only one instance.
- Horizontal scaling — multiple portal instances share state.
Database
| Variable | Required | Default | Description |
|---|---|---|---|
DB_POOL_MAX | No | 20 | Max PostgreSQL connection pool size |
DB_SSL | No | none | none = no SSL, require = SSL (skip cert verify), verify = full CA verification, verify-ca = custom CA cert |
DB_SSL_CA | No | — | Path to CA certificate file (used with DB_SSL=verify-ca) |
TASK_LOG_RETENTION_DAYS | No | 30 | Days to keep task log entries |
AUDIT_LOG_RETENTION_DAYS | No | 90 | Days to keep audit log entries |
TASK_LOG_MAX_PER_ACCOUNT | No | 5000 | Max task log entries per account |
AUDIT_LOG_MAX_PER_ACCOUNT | No | 10000 | Max audit log entries per account |
SESSION_RETENTION_DAYS | No | 30 | Days before inactive login sessions are deleted |
PENDING_AGENT_RETENTION_DAYS | No | 14 | Days before unapproved agent enrollments are deleted |
EMAIL_VERIFY_RETENTION_DAYS | No | 7 | Days before stale email verification tokens are cleared |
MONITOR_RETENTION_DAYS | No | 90 | Days to keep monitor events. Rollups are kept 4× longer for trend charts. |
Performance notes
The portal includes several built-in performance optimizations for high-load deployments:
- MCP tool caching — Generated tool lists are cached per account (60s TTL), automatically invalidated when skills are created, modified, or deleted. Assigning or removing skills from agents/groups does not change the MCP tool list.
- Auth caching — MCP authentication results are cached (30s TTL) to avoid repeated DB lookups.
- Batched heartbeats — Agent heartbeat writes are batched and flushed every 5 seconds instead of one DB write per heartbeat.
- Webhook caching — Webhook configurations are cached per account (60s TTL).
- Optimized queries — MCP tool calls use a single combined query instead of multiple sequential lookups.
- Database indexes — Indexes on hot-path columns (token hashes, access control tables, agent lookups).
For high-traffic deployments, increase DB_POOL_MAX and configure REDIS_URL for session persistence and horizontal scaling.
Background Maintenance
The portal automatically cleans up stale data using three background tasks. Each runs on a distributed Redis lock, so only one portal instance executes per interval — no external cron is needed.
| Task | Interval | What it does |
|---|---|---|
| OAuth cleanup | Every 30 min | Deletes expired MCP OAuth tokens and authorization codes |
| Log purge | Every 1 hour | Age-based and count-based pruning of task_log and audit_log |
| Maintenance | Every 6 hours | Cleans all other stale resources (see table below) |
| Scheduled scans | Every 15 min | Triggers security audits and system inventories for agents with a configured schedule (daily/weekly/monthly), and generates scheduled PDF reports for accounts |
All tasks also run once on portal startup.
Maintenance targets
| Resource | Cleanup rule | Configurable |
|---|---|---|
| Login sessions | No activity in SESSION_RETENTION_DAYS (default 30) | Yes |
| Expired invitations | Past expires_at and not accepted | — |
| Expired API keys | Past optional expires_at | — |
| Password reset tokens | Past password_reset_expires_at | — |
| Email verification tokens | Unverified accounts older than EMAIL_VERIFY_RETENTION_DAYS (default 7) | Yes |
| WebAuthn challenges | User inactive > 6 hours (abandoned registration flow) | — |
| Pending agent enrollments | Unapproved for PENDING_AGENT_RETENTION_DAYS (default 14) | Yes |
| Monitor events | Older than MONITOR_RETENTION_DAYS (default 90) | Yes |
| Monitor rollups | Older than 4× MONITOR_RETENTION_DAYS (default 360 days) | Yes |
| PKI certificates | Soft-deleted certs after natural expiry, expired certs after 7 days, stale failed/pending after 7 days | Yes |
Configurable retention values can be set via environment variables in .env. See the Deployment & .env section for details.
Reinstalling an Agent
You can reinstall an agent without losing its configuration (skills, groups, members).
- Go to the agent's detail page.
- Click the Reinstall button.
- Copy the install command and run it on the server.
- Approve the re-enrollment when prompted.
The agent gets a fresh access token and signing key while keeping all its existing configuration intact.
Custom Skills
You can create your own skills to extend what agents can do.
- Go to Agent Skills and click Create Skill.
- Define the skill's slug, name, and description.
- Add operations (name and description for each capability).
- Set the allowed commands.
- Write a system prompt that guides the LLM.
Tips for custom skills
- Be specific with allowed commands. Only permit what's needed.
- Write clear system prompts. Tell the LLM what it is, what it can do, and any constraints.
- Write descriptive operations. Each operation's description helps the LLM understand the skill's capabilities.
- Test with the UI first before connecting Claude, using the Run Task button on the agent detail page.
Import / Export
Skills can be exported as JSON files and imported into other accounts. Use the export button on any skill, or import from the skills page.
Skill Documents (RAG)
You can upload reference documentation to any skill. When a task is dispatched, relevant sections are automatically retrieved and injected into the LLM prompt — giving the agent knowledge about products, tools, or APIs that the LLM wasn't trained on.
How it works
- Upload — Drop
.txt,.md,.pdf,.html,.doc, or.docxfiles onto the skill's edit form. Text is extracted automatically and chunked for indexing. - Retrieve — When a task matches, the portal searches document chunks using the task instruction and retrieves the top matching sections.
- Inject — Matching chunks are injected into the agent's system prompt as a
REFERENCE DOCUMENTATIONblock, before the task instructions.
Uploading documents
- Go to Agent Skills and click the Edit (pencil) icon on a skill.
- Below the Detailed Description field, you'll see the Reference Documents section with a drag-and-drop zone.
- Drop one or more files (
.txt,.md,.pdf,.html,.doc,.docx), or click the zone to browse. - Each uploaded file is shown with its filename, size, chunk count, and upload date.
- To remove a document, click the trash icon next to it.
Chunking
Text is first extracted from the uploaded file (PDF → pdf-parse, DOC/DOCX → mammoth, HTML → tag stripping, TXT/MD → as-is), then split into chunks of ~1000–1500 characters for efficient retrieval:
- Markdown files are split on headings (
##). Large sections are further split at paragraph boundaries. Heading context is preserved with each chunk. - All other formats (plain text, PDF, HTML, DOC/DOCX) are split at paragraph boundaries. Short paragraphs are merged together.
Retrieval at task time
When a task is sent to an agent, the portal searches the skill's document chunks using PostgreSQL's websearch_to_tsquery. The top matching chunks (up to 10 chunks / 30,000 characters by default) are injected into the system prompt. If no chunks match the instruction, nothing is injected.
Limits
| Limit | Default | Environment Variable |
|---|---|---|
| Max file size | 2 MB | SKILL_DOC_MAX_SIZE_BYTES |
| Max documents per skill | 10 | SKILL_DOC_MAX_PER_SKILL |
| Max total size per skill | 10 MB | SKILL_DOC_MAX_TOTAL_BYTES |
| Max chunks per task | 10 | RAG_MAX_CHUNKS |
| Max chars per task | 30,000 | RAG_MAX_CHARS |
Use cases
- Custom application docs — Upload deployment guides, runbooks, or config references for internal tools.
- API documentation — Give agents context about APIs they need to interact with.
- Product manuals — Upload vendor documentation for products the LLM wasn't trained on.
- Compliance & procedures — Upload SOPs or checklists the agent should follow.