ManageLM Documentation

Manage your Linux and Windows servers with natural language — securely, instantly, at scale.

Deployment mode:

Overview

ManageLM is a remote server management platform. Instead of SSH-ing into servers and running commands manually, you describe what you want in plain English and ManageLM takes care of the rest.

You are running the self-hosted version. The portal runs on your own infrastructure via package installer or Docker.

You are using the SaaS version hosted by ManageLM.

The three components:

Portal — The control plane (this web app). Manages accounts, agents, skills, and bridges communication.
Agent — A lightweight daemon on each managed Linux or Windows server. Receives tasks, uses an LLM to interpret them, executes commands, and reports back.
Claude — Connects via MCP (Model Context Protocol) to the portal. You talk to Claude, Claude talks to your servers.

How It Works

You ask Claude to do something on a server, e.g. "Restart nginx on web-01".
Claude calls a tool on the portal that maps to the agent's assigned skills.
The portal forwards the task to the target agent.
The configured LLM (local Ollama/LM Studio, or a cloud provider) interprets the task and generates the shell commands.
Commands are validated against the skill's allowlist before execution — only explicitly permitted commands can run.
Results flow back through the agent → portal → Claude → you.

Key security principle: Agents only initiate outbound connections. No inbound ports are needed on your servers. Commands are validated against a strict allowlist defined by each skill.

What you can do

Just describe what you need in plain English. Here are examples across skills:

Category	Example prompt
Services	"Restart nginx on web-01 and show me the last 20 log lines"
Packages	"Update all packages on production servers"
Users	"Add SSH access for Charly on user deploy on pocmail"
Security	"Run a security audit on all servers and email me a summary"
Access	"Who has sudo on production servers?"
Activity	"Run an activity audit on dev and show who logged in today"
Files	"Add a server block for api.example.com to nginx on web-01"
Firewall	"Open port 8080 on staging servers"
Containers	"List all running Docker containers on docker-01 and show which ones use more than 1GB memory"
Certificates	"Check TLS certificate expiry on all web servers"
Backups	"Show me the last backup status for every agent and which ones are failing"
Database	"Show the slow query log for MySQL on db-01"
Monitoring	"Which servers have disk usage above 85%?"
Multi-server	"Check if chrony is running on all servers, install it where it's missing"
LLM	"Pull llama3.2 on the Ollama server and test it with a simple prompt"

These are not templates — you can phrase requests however you want. The agent interprets intent and adapts to each server's OS (Linux or Windows), package manager, and configuration.

Quick Start

From zero to managing a server with natural language — in under 10 minutes.

What You'll Need

A Linux server (Ubuntu, Debian, RHEL, Rocky, Alma, Fedora, etc.) or Windows Server you want to manage
Root/sudo access (Linux) or Administrator access (Windows)
Python 3.9+ and curl installed (Linux), or Python 3.9+ and PowerShell 7+ (Windows)
A web browser to access the ManageLM portal

Create an account — Register on the portal and verify your email.
Configure the LLM — Go to Settings → Account. Choose Local LLM (install Ollama and run ollama pull qwen3.5:9b) or Cloud LLM (enter a provider API key).
Import Skills — Go to Agent Skills → Catalog and import the skills your agent will need. Start with system, files, services, packages, and users.
Install the agent — Click Add Agent in the dashboard, copy the install command, and run it on your server.
Approve the agent — The portal detects the enrollment automatically. Verify the hostname and click Approve.
Assign skills — Click on the agent, scroll to Assigned Skills, and assign the skills you imported.
Connect Claude — Copy the MCP connector details from Settings → MCP & API into Claude Desktop or Claude Code.
Run your first task — Ask Claude: "Show me the system info on web-01", or use the portal's Run Task button directly.

That's it! Your agent is running, skills are assigned, and you can manage your server with natural language — either through Claude or the portal UI. Read on for detailed instructions on each step.

Create an Account

Navigate to the portal and click Register.
Enter your first name, last name, email, and password.
Check your email for a verification link and click it.
Log in to the portal. You're now the owner of your account.

As the account owner, you have full access to all features. You can invite team members later from the Users & Roles page.

Install an Agent

Agents are installed on any Linux or Windows server you want to manage. The install is a single command.

Log in to the portal and go to My Agents.
Click Add Agent.
Optionally select one or more groups. The agent is automatically placed in the site you have selected in the sidebar (or stays unassigned if you're viewing "All sites" — see Sites).
Copy the install command and run it on your server:

Linux

curl -fsSL "https://your-portal/install.sh?token=..." | sh

The Linux installer will:

Check prerequisites (Python 3.9+, curl)
Download agent files to /opt/managelm/
Install Python dependencies
Enroll the agent with the portal
Wait for your approval
Set up a systemd service that starts automatically

Linux requirements: Python 3.9+, curl, and root/sudo access. The install script supports both apt-based and dnf-based distributions.

Windows

On Windows, the portal provides a PowerShell install script. Copy it from the Add Agent modal (Windows tab) and run it in an elevated PowerShell session. The Windows installer performs the same enrollment steps and registers the agent as a Windows service.

Windows requirements: Python 3.9+, PowerShell 7+, and Administrator access.

Approve the Agent

After the install script runs, the agent appears in the portal as pending approval.

The portal's Add Agent modal will automatically detect the new enrollment and show an approval prompt.
Verify the hostname and click Approve.
The agent receives its access token and connects via WebSocket.
A green Connected indicator appears in the agent list.

You can also approve agents from the agent list by clicking the Approve button on any pending agent.

Set Up the LLM

Each agent uses an LLM to interpret tasks and generate commands. Configure from Settings → Account.

Option 1: Local LLM (Recommended)

Install Ollama or LM Studio for full data privacy — your commands and data never leave your infrastructure. The LLM can run on the agent server itself or on a dedicated machine accessible by your agents.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a recommended model
ollama pull qwen3.5:9b

Ollama listens on http://localhost:11434 by default. If Ollama runs on a separate server, set the LLM API URL to its address (e.g. http://llm-server:11434) in Settings → Account.

Recommended local models

ManageLM agents need an LLM that reliably follows structured output formats (<cmd> tags, <done/> markers). For IT agent workloads — generating shell commands, managing services, parsing logs — models with strong instruction following perform best. On consumer GPUs we recommend Mixture-of-Experts (MoE) models as the preferred default: gemma-4-26b-a4b (Gemma 4, 3.8B active) and qwen3.6:27b-a3b (Qwen 3.6, 3B active). Both deliver near -dense quality at a fraction of the per-token cost, and Gemma 4 adds native multimodal support and a 256K-token context for large log / config analysis. Dense models (Gemma 4 31B, Qwen 3.6 27B, Qwen 3.5 35B, Mistral, Llama 3.3) remain the recommended choice on high-end hardware where the extra VRAM and throughput are available.

All VRAM figures below assume 4-bit quantization (Q4), which is the default for Ollama/LM Studio and keeps quality within ~2–5% of full precision while cutting memory by roughly 60%. Add 1–3 GB of overhead for the runtime, KV cache, and typical context — more for long contexts on dense models.

CPU-only servers (not recommended)

Model	Size	RAM	Notes
qwen3.5:9b	9B	~7 GB	Best balance of speed and accuracy for CPU-only servers.
qwen3.5:4b	4B	~4 GB	Lightweight option for constrained servers or simple skills.
gemma-4-e4b	E4B (4B effective)	~5 GB	Gemma 4 edge model — native multimodal (text, image, audio, video), 128K context. Runs on modest CPU or 8 GB class GPU.
ministral-3:8b	8B	~6 GB	Mistral’s edge model with strong function calling and 128K context. Good alternative to `qwen3.5:9b` when Mistral’s instruction style fits your skills better.

GPU servers (min 16–24 GB VRAM)

Model	Size	VRAM	Notes
qwen3.6:27b-a3b	27B MoE (3B active)	~17 GB	Qwen 3.6 Mixture-of-Experts — ~3B active parameters give near-4B latency with quality close to a dense 27B. Strong tool-use and structured output for agent skills. Fits RTX 3090/4090.
qwen3.5:35b-a3b	35B MoE (3B active)	~22 GB	Qwen 3.5 Mixture-of-Experts — larger expert pool than the 27B variant for higher peak quality, while keeping ~3B active parameters for near-4B latency. Fits 24 GB GPUs (RTX 4090, A5000).
gemma-4-26b-a4b	26B MoE (3.8B active)	~18 GB	Mixture-of-Experts — only 3.8B active parameters at inference, so tokens-per-second are close to a 4B model while quality is close to a 26B. 256K context (memory stays modest: ~18 GB at 4K → ~23 GB at 256K). Fits RTX 3090/4090.
gpt-oss:20b	20B MoE (3.6B active)	~13 GB	OpenAI open-weights Mixture-of-Experts — strong tool-use and structured output with very fast tokens-per-second. Different model lineage from Qwen/Gemma/Mistral, useful as a second opinion or fallback. Fits comfortably on 16 GB GPUs.
mistral-small3.2	24B	~16 GB	Mistral’s small model with strong function calling and instruction following.
ministral-3:14b	14B	~10 GB	Mid-tier Mistral model — fast tokens-per-second on consumer GPUs (RTX 3080/4070+) with solid tool-use. Leaves headroom for long contexts or parallel skills.

Tip: You can assign different models per skill using LLM Model Override in the agent detail page — for example, use a larger model for complex skills like containers or kubernetes, and qwen3.5:9b for simple skills like system or users. This optimizes both quality and throughput.

Option 2: Cloud LLM

Use an external cloud provider instead of running a local LLM. Supported providers:

Provider	Example models
Anthropic (Claude)	claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001
OpenAI (ChatGPT)	gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, o3, o4-mini
Google (Gemini)	gemini-3.5-flash, gemini-3.1-flash-lite, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3.1-pro-preview, gemini-3-flash-preview
xAI (Grok)	grok-4, grok-4-fast, grok-3, grok-3-mini
Groq	llama-3.3-70b-versatile, llama-4-scout-17b-16e-instruct, deepseek-r1-distill-llama-70b
Mistral	mistral-large-latest, codestral-latest, ministral-8b-latest
DeepSeek	deepseek-chat, deepseek-reasoner

Select the provider and model from the dropdown, enter your API key, and click Test to verify the connection before saving.

Privacy note: With cloud LLM, task data (commands, parameters, outputs) is sent to the provider for processing. For production workloads with sensitive data, use a local LLM instead.

LLM Access Mode

In self-hosted mode, you can choose how agents access the LLM:

Direct (default) — Each agent calls the LLM directly. The API key is sent to the agent.
Proxied — Agents route LLM calls through the portal. The API key stays on the portal server and is never sent to agents. This is useful for centralized key management or when agents should not have direct network access to the LLM provider.

Set the access mode from Settings → Account using the Direct / Proxied toggle. Agents with per-agent LLM overrides always use direct access regardless of this setting.

Proxied badge: When an agent uses proxied access, the dashboard shows an orange PROXIED badge next to the LLM status.

LLM detection: When an agent connects, it probes the LLM endpoint and reports the service type and reachability status in the portal dashboard.

Assign Skills

Skills define what an agent is allowed to do. Without skills, an agent can only perform read-only operations.

Go to Agent Skills in the sidebar.
Click Catalog to browse the built-in skills.
Import the skills you need (e.g. "Systemd Service Management", "Package Management").
Navigate to your agent's detail page.
In the Assigned Skills section, click the skill buttons to assign them.

Built-in Skills (31 total)

Skills are available for both Linux and Windows agents. On Linux, skills use shell commands (bash); on Windows, skills use PowerShell-based equivalents. The skill catalog includes platform-appropriate commands for each OS.

Skill	What it can do
`base`	Core read-only utilities (file reading, search, system info, resource usage, network diagnostics). Auto-assigned to all agents.
`system`	System info, performance, hostname, timezone, kernel, reboot.
`files`	Create, read, write, move, copy, delete files. Permissions, compression, upload/download.
`services`	Start, stop, restart services. Systemd on Linux, Windows Services on Windows. Cron jobs, timers, scheduled tasks, process management.
`packages`	Install, update, remove packages. Linux: apt, dnf, yum, pacman, zypper, snap. Windows: Chocolatey, winget, MSI.
`users`	Create/manage system users, groups, SSH keys, and sudo access.
`network`	Configure interfaces, routes, DNS, diagnose connectivity and ports.
`firewall`	Manage firewall rules. Linux: UFW, firewalld, iptables, nftables. Windows: Windows Firewall (netsh/PowerShell).
`storage`	Disks, partitions, filesystems, LVM, RAID, mounts, and swap.
`security`	Auditing, hardening. Linux: fail2ban, SELinux/AppArmor, SSH config. Windows: Windows Defender, BitLocker, Group Policy, Windows Firewall. Intrusion detection.
`certificates`	SSL/TLS certificates, Let's Encrypt, CAs, Java keystores, trust stores.
`logs`	View, search, and analyze system and application logs (read-only).
`monitoring`	System health, resource usage, disk/network I/O, service checks.
`containers`	Docker, Podman, Buildah, images, volumes, networks, Compose.
`webserver`	Nginx, Apache, Caddy, Tomcat — sites, configs, SSL, reverse proxy.
`webapps`	Node.js, Python, PHP, Ruby, Java apps — PM2, Gunicorn, Supervisor.
`database`	MySQL, PostgreSQL, SQLite — queries, schemas, users, backups.
`nosql`	MongoDB, Redis, Elasticsearch — data operations, backups, clusters.
`git`	Git repositories — clone, pull, push, branches, deployment workflows.
`backup`	Backup and restore with rsync, tar, cron — files, dirs, databases.
`dns`	BIND, Unbound, dnsmasq — zones, records, resolver configuration.
`email`	Postfix, Dovecot, queues, aliases, DKIM, SPF, spam filtering.
`vpn`	WireGuard, OpenVPN, IPsec — tunnels, peers, keys.
`virtualization`	KVM/QEMU, libvirt, LXC/LXD, Proxmox, Vagrant.
`kubernetes`	Pods, deployments, services, Helm, scaling, troubleshooting.
`proxy`	Squid, Varnish, HAProxy — reverse proxy, caching, load balancing.
`messagequeue`	RabbitMQ, Kafka, NATS, ActiveMQ — queues, consumers, messages.
`filesharing`	NFS, Samba/SMB, FTP/SFTP, WebDAV.
`ldap`	OpenLDAP, FreeIPA, SSSD — directory services, centralized auth.
`automation`	Ansible, Terraform, cloud-init — infrastructure as code.
`llm`	Ollama, vLLM, llama.cpp — local LLM server and model management.

Security note: Each skill has a strict command allowlist. For example, the services skill on Linux can only run systemctl and journalctl; on Windows, only Get-Service, Restart-Service, etc. The agent rejects any command not on the list. An agent with no skills can only run read-only commands.

Choose Your Interface

ManageLM is not tied to a single tool. You can manage your servers from Claude, ChatGPT, your terminal, the web portal, VS Code, Slack, or n8n — pick whatever fits your workflow.

Scenario	Claude MCP	ChatGPT	Shell	Portal	VS Code	Slack	n8n
Natural language tasks	✓	✓	✓	✓	✓	Slash cmds	Structured
Multi-step reasoning	✓ Best	✓	—	—	✓	—	Workflows
Scheduled & automated tasks	Via portal	Via portal	✓ Cron	✓ Built-in	—	Webhooks	✓ Native
Security audits & reports	✓	✓	✓	✓ + PDF	✓	—	✓
Fleet operations	✓	✓	✓	✓ Bulk select	✓	✓	✓
CI/CD & scripting	—	—	✓ Best	✓ API	—	Alerts	✓ Best
Team collaboration	Per user	Per user	Per user	✓ RBAC + audit	Per user	✓ Shared channels	✓ Shared
Offline / air-gapped	✗	✗	✓	✓ Self-hosted	✗	✗	✓ Self-hosted

Summary: Use Claude MCP or ChatGPT for complex, conversational tasks. Use the Shell for scripting and cron jobs. Use the Portal for dashboards, RBAC, and PDF reports. Use VS Code to manage servers from your editor. Use Slack for team alerts and approvals. Use n8n for automation pipelines.

Connect Claude

ManageLM integrates with Claude via the Model Context Protocol (MCP). Claude sees your servers as tools it can call.

There are two ways to connect Claude to ManageLM. Pick based on whether Claude can reach your portal directly:

Custom Connector (OAuth) — simplest. Claude calls the portal from Anthropic's cloud. Requires the portal to be publicly reachable (e.g. app.managelm.com or any self-hosted portal exposed on the public Internet).
Local bridge (mcp-remote) — for portals on a private network (VPN, intranet, behind a firewall). A small Node.js process runs on your machine and tunnels MCP calls from Claude to the portal. Works for any portal your laptop can reach.

Option A: Custom Connector (OAuth)

Available on every Claude plan (Free is limited to one connector). Uses OAuth 2.0 with PKCE, the standard MCP authentication method.

Go to Settings → MCP & API in the portal — you'll need the four fields shown there: Name, Remote MCP URL, OAuth Client ID, OAuth Client Secret.
In Claude Desktop: Customize → Connectors → Add custom connector. Paste the four fields and save.
On Claude Team plans, the org admin adds the connector once under Organization Settings → Connectors → Add → Custom → Web (on claude.ai), and each member then enables it under Customize → Connectors with their own ManageLM credentials.

Option B: Local bridge (`mcp-remote`)

Requires Node.js installed locally. The same JSON snippet works for both Claude Desktop and Claude Code — only the config file location differs.

Go to Settings → MCP & API in the portal, expand MCP Connector (JSON), and copy the snippet (it embeds your credentials).
Paste it into the relevant config file:
- Claude Desktop on macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
- Claude Desktop on Windows: %APPDATA%\Claude\claude_desktop_config.json
- Claude Code: use claude mcp add, or paste the same snippet into the project's .mcp.json or your user-level Claude Code config.
Restart Claude Desktop (or reload Claude Code).

The snippet looks like this:

{
  "mcpServers": {
    "ManageLM": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://your-portal/mcp",
        "--header", "X-MCP-Id: your-client-id",
        "--header", "X-MCP-Secret: your-secret"
      ]
    }
  }
}

macOS gotcha: apps launched from Finder or the Dock don't inherit your shell PATH, so plain npx may fail with "command not found". If that happens, replace "npx" with its absolute path (run which npx in a terminal to find it).

What Claude sees

Once connected, Claude gets one tool per skill slug (e.g. system, services, files). Each tool takes two parameters:

target — Agent hostname, group name, site name, or "all".
instruction — Natural language description of the task.

For example, Claude calls the services tool with target: "web-01" and instruction: "restart nginx", or with target: "Frankfurt" to fan out to every agent in that site.

Claude also gets built-in meta-tools:

get_agent_info — Get detailed info for a single agent (OS, version, health, LLM status, assigned skills, recent tasks)
get_agent_skills — See what an agent can do (assigned, group-inherited, and unassigned skills)
list_available_skills — Discover catalog skills not yet imported into your account
get_account_info — Check your plan, usage limits, and current consumption
list_team_members — List account users with roles, permissions, and registered SSH public keys
search_agents — List your servers (status, OS, health metrics, LLM readiness, groups, site); filter by health metrics, OS, status, group, site, or free text
search_inventory — Search system inventory across all agents
search_security — Search security audit findings across all agents
search_ssh_keys — Search SSH keys: registered profile keys + deployed keys on servers with identity mapping
search_sudo_rules — Search sudo privileges across all agents
run_security_audit — Trigger a security audit and wait for results
run_inventory_scan — Trigger an inventory scan and wait for results
run_access_scan — Trigger an SSH & sudo access scan and wait for results
run_activity_scan — Trigger an activity scan and wait for results
get_task_status — Check on a running task
get_task_history — View recent commands on a server
get_task_changes — View file changes made by a task
answer_task — Answer a question from an interactive task
revert_task — Revert file changes made by a task
send_email — Send yourself a report or summary email

Note: The tool list is fetched at connection time. If skills are added or removed during a session, Claude won't see the changes until you reconnect (restart Claude Desktop or re-open Claude Code).

Run Tasks

You can run tasks in two ways:

Via Claude (MCP)

Just describe what you want in natural language:

"Restart the nginx service on web-01"
"Install htop on all servers in the production group"
"Check disk usage on db-01"
"Show the last 50 lines of the postgresql log on db-01"

Via the Portal UI

Click on an online agent in the dashboard.
Click the Run Task button.
Select a skill from the dropdown.
Type a natural-language instruction describing what you want.
Click Execute.

Task results appear in the Command History section on the agent detail page and in the Request Log page.

Agent CLI Tools

Three on-host commands ship with the agent and talk to the local daemon over a Unix socket — no network, no portal round-trip. They reuse the same skill gate, command validator, and kernel sandbox as portal tasks, and work offline whenever the local LLM is configured. All three are installed in /opt/managelm/bin/ (Linux) or C:\ProgramData\ManageLM\bin\ (Windows) and also exposed on the PATH.

Security model: These tools run as root / LocalSystem on the managed host. They bypass portal user identity (no per-user RBAC) but still go through the skill's allowed-commands validator and the kernel sandbox, and every task is forwarded to the portal's audit log with source shell.

`managelm-shell` — Interactive terminal

A natural-language REPL on the managed server. Type what you want, the agent auto-routes it to the best skill, runs it through the sandbox, and streams the answer back.

# Interactive REPL
managelm-shell

# One-shot
managelm-shell -c "install htop and verify"

# Force a specific skill
managelm-shell
> @services restart nginx

Auto skill routing — the daemon picks the right skill from your phrasing; use @skill to override.
Multi-step planner — complex requests are auto-decomposed into sequential steps across skills and a numbered execution plan is shown before running.
Streaming output, tab completion, command history, elapsed time, and rich markdown rendering.
Follow-ups — type > … to continue the previous conversation with full context.
Changeset & rollback — file writes are snapshotted; changes lists them, rollback #N reverts.
Interactive tasks — the LLM can pause and ask for information only you can provide (domain, password, licence key) and resume with the answer.
Linux commands marked interactive run in a PTY with the local LLM driving stdin.

Shell tasks show up in the Audit Log (Agent Activity tab) and webhooks just like portal tasks.

`managelm-fixit` — Diagnose & fix one file

Point it at any misbehaving file. The agent classifies the content, picks the right skill, diagnoses the issue, and proposes a full-file replacement as a colored diff. Apply on y, reject on N.

# Diagnose, show diff, y/N to apply
managelm-fixit /etc/nginx/nginx.conf

# Diagnosis only, no diff
managelm-fixit --explain /etc/postfix/main.cf

# Auto-apply without prompting
managelm-fixit --yes /var/www/app/config.yaml

# Force a skill and add a hint
managelm-fixit @webserver -c "502 after upgrade" /etc/nginx/nginx.conf

Content-based routing — the skill is chosen from the file's content, not its path, so it works on any text file (configs, scripts, code).
Atomic writes with owner, group, and mode preserved; rolled back automatically if the post-fix validator fails.
Same changeset log as shell tasks — managelm-shell → rollback #N reverts applied fixes.
Non-zero exit codes distinguish nothing to fix / proposed but not applied / applied / error, suitable for CI or pre-commit hooks.

`managelm-review` — Read-only review

Where fixit writes, review only reads. Point it at a file or a directory and get a short summary plus a list of findings grouped by severity. Nothing is written to disk.

# Review a single file
managelm-review /etc/ssh/sshd_config

# Review a directory (walks recursively, skips .git, node_modules, …)
managelm-review ./src/

# Only warning + critical
managelm-review --severity warning ./src/

# JSON for CI
managelm-review --format json ./src/

Findings carry a line number, severity (info / warning / critical) and category (security, bug, style, perf, maintainability).
Directories are walked safely: noisy dirs skipped, symlinks not followed, hard cap at 20 files, confirmation above 5.
Exit code 1 when findings are present at or above the severity threshold — drop it into a pre-commit hook or CI stage.
Ideal companion to fixit: review a directory, then fixit the files that matter.

Quick Reports

Quick reports are one-click diagnostic commands available on the Agent Assets page. They let you run common checks on any online agent without writing instructions.

How it works

Open the Agent Assets page.
Each agent card shows small icon buttons below the OS info line — one per available report.
Click an icon to run the report. A modal opens showing a spinner while the agent executes.
When complete, the modal displays the LLM summary (a readable interpretation) and the raw terminal output.
Click Copy to copy both summary and output to your clipboard.

Requirements: The agent must be online and you need admin/owner role or the agents permission. Reports only appear for skills that are assigned to the agent (directly or via a group).

Available reports

Nine built-in skills include quick reports. Each report runs a pre-built instruction on the agent:

Skill	Report	What it checks
`system`	System Summary	Hostname, OS, kernel, uptime, load, memory and disk usage
`system`	Top Processes	Top 10 processes by CPU and memory
`services`	Service Inventory	All services with status and enabled state (systemd on Linux, Windows Services on Windows)
`services`	Failed Services	Services in failed state
`packages`	Available Updates	Packages with pending updates
`users`	User Accounts	All users and groups with UID, GID, home, shell
`network`	Listening Ports	All listening TCP/UDP ports with their process
`security`	Security Overview	Listening ports, SSH config, fail2ban status
`containers`	Container Status	All containers with name, image, status, ports
`storage`	Disk Usage	Disk usage for all mounted filesystems
`logs`	Recent Errors	Errors and warnings from the system journal (Linux) or Windows Event Log (last 30 min)

Active task indicators

Agent cards on both the My Agents dashboard and the Agent Assets page display a red badge with a spinning icon when the agent has tasks currently running. The badge shows the number of active tasks (pending, sent, or executing).

Portal UI Guide

Page	Purpose
My Agents	Dashboard showing all agents, their status, LLM info, skills, active task indicators, and a 7-day task activity chart. Add, approve, search, and bulk-manage agents.
Agent Detail	Configure an agent: display name, LLM settings, tags, groups, assigned skills, member access. Run tasks and view command history.
Agent Assets	Visual server map with agents organized by collapsible group zones, click-to-expand agent cards with 24h metrics and cloud provider metadata, quick report buttons, security audit, system inventory, SSH & sudo access, activity audit, service dependencies, scheduled PDF reports, and bulk select operations.
Agent Skills & Rulesets	Import skills from the built-in catalog, create custom skills or rulesets, or import/export skill JSON files. Two tabs: Skills (per-skill commands and prompts) and Rulesets (cross-skill policy text). Attachment counts on each row reflect the active site when one is selected.
Agent Groups & Sites	Two tabs: Groups (logical organization — e.g. "production", "staging" — carry skills, rulesets, SIEM, and member access) and Sites (physical/logical location — datacenters, offices — carry an optional Local LLM endpoint and feed the global site filter). See Sites for the full picture.
Users & Roles	Invite team members, assign roles (admin/member), and configure granular permissions.
Monitors	Service monitors — track availability and response time of 43 service types (HTTP, TCP, DNS, SMTP, databases, message brokers, VPNs, and more). Sparkline charts, status badges, alert toggles, categorized catalog, test-before-create.
Certificates	Certificate management — issue, renew, and revoke TLS certificates via internal CA or Let's Encrypt. Deploy to agents automatically. CRL generation. Daily auto-renewal sweep.
System Backups	System Backups — end-to-end encrypted filesystem backups to your own S3 storage (OVH, AWS, R2, B2, Wasabi, Scaleway, MinIO). Streaming downloads, restore to any agent, detach-on-delete, optional service quiesce for consistent database snapshots.
Pentests	Automated penetration testing for public-facing agents using nmap, nuclei, testssl.sh, ffuf, subfinder. Credit-based scans with domain verification. Results feed into the Compliance page. Pro/Business plans.
Compliance	Compliance framework mapping — automatically projects security audit and pentest results onto CIS Level 1, CIS Docker, SOC 2, PCI DSS, ISO 27001, NIS2, NIST CSF, and HIPAA. Drift detection with in-app and email alerts. Per-framework evidence PDFs.
Connectors	External integrations split into three kinds: Cloud Integration (Azure, AWS, Google Cloud, VMware, Proxmox, OpenStack — credentials, test connections, sync resources, browse discovered cloud inventory with agent matching), SIEM Integration (Splunk HEC, Elasticsearch `_bulk`, generic JSON webhook — agents forward task-completion events directly to the destination, per-agent or inherited from an agent group), and Notifications & Ticketing (Slack messages, ServiceNow and Jira tickets — per-category routing for monitor / cert / backup / security / threat / report / pentest / agent / task events with severity-aware defaults).
Audit Log	Four-tab viewer of everything that happened in your account: Agent Activity (task log per agent with AI summary in the expanded row, Changes-only toggle, and PDF export — replaces the standalone Reporting page), Threat Alerts, Admin Actions, and Geolocation (world map of admin connection origins, when GeoIP is configured). Account-wide visibility requires the `logs` permission; otherwise the view is scoped to the agents you are assigned to.
Settings	Profile (name, timezone, System Username), Security (passkeys, MFA, SSH public keys, verified domains, sessions), MCP & API (credentials, IP whitelist, API keys, webhooks), PKI & CA (internal CA setup, Let's Encrypt account, DNS-01 providers, certificate defaults), S3 Backups (provider, bucket, credentials, orphan cleanup), Account (plan, LLM defaults, danger zone).

Skills

Skills are the core security and capability model. Each skill defines:

Operations — Named capabilities (e.g. restart, install, list) that describe what the skill can do.
Allowed commands — The exact shell commands the agent can execute (e.g. systemctl, apt).
System prompt — Instructions for the LLM on how to perform operations.

Hard security boundary: The allowed_commands list is enforced in code, not just in prompts. Even if the LLM generates a command not on the list, the agent will reject it. Empty allowed_commands = read-only mode (only safe commands like ls, cat, grep).

Management Hints

Each skill assignment (on an agent or a group) supports management hints — free-text contextual instructions injected into the LLM system prompt as an ADMINISTRATOR HINTS block. Use hints to provide server-specific or group-wide context that helps the LLM do its job:

Custom paths: "PostgreSQL 16 data dir is /data/pg16, config in /etc/postgresql/16/"
Port overrides: "Nginx runs on port 8080 behind HAProxy"
Conventions: "Always use sudo -u postgres for database operations"
Environment notes: "This is a staging server. Safe to restart services during business hours."

Hints can be set at two levels:

Level	Where to set	Scope
Per-agent	Agent detail → expand skill → Management Hints	This skill on this specific agent
Per-group	Agent Groups → expand skill → Management Hints	This skill on all agents in the group

Direct per-agent skill assignments take priority over group-inherited ones (including their hints).

Skill Definition Example

Below is an example of a Linux skill definition. Windows skills follow the same structure but with PowerShell cmdlets in allowed_commands.

{
  "description": "Manage systemd services",
  "operations": [
    {
      "name": "restart",
      "description": "Restart a systemd service"
    },
    {
      "name": "status",
      "description": "Get status of a service including recent logs"
    }
  ],
  "allowed_commands": ["systemctl", "journalctl"],
  "system_prompt": "You are a Linux sysadmin..."
}

Operations are instruction-based — each operation has only a name and description. They describe capabilities for documentation and AI context, not structured parameter schemas. The agent LLM interprets the user's natural-language instruction to determine what commands to run.

Skill Combinations

Many real-world management tasks span multiple skills. Each skill controls a specific domain — when an operation touches several domains, you need all the relevant skills assigned to the agent.

How it works: When you ask Claude to perform a multi-step task, it will call the appropriate skill tools one after another. If a required skill is missing, that step will fail with a "skill not assigned" error. Plan ahead by assigning all the skills an agent needs for its role.

Foundation skills

These five skills are used by almost every management workflow. Consider assigning them to all agents as a baseline:

Skill	Why it's foundational
`system`	System info, hostname, timezone — needed to understand what you're working with.
`files`	Read/write config files, set permissions — almost every change touches a file.
`services`	Start/stop/restart daemons, manage cron — most installations end with a service reload.
`packages`	Install software — any new capability starts with installing a package.
`users`	Create accounts, manage SSH keys, sudo — many services need a dedicated user.

Common multi-skill workflows

Below are typical management tasks and the skills they require. Each example shows what you'd ask Claude and which skills are involved.

Create a new system user with SSH access

"Create user deploy with a home directory, add their SSH key, and set them up with sudo access for systemctl"

Step	Skill needed
Create user account and group	`users`
Create home directory and set ownership	`files`
Add SSH authorized key	`users`
Configure sudoers entry	`users`

Skills: users + files

Install and configure Nginx with SSL

"Install nginx, create a site for example.com with Let's Encrypt SSL, and open port 443 in the firewall"

Step	Skill needed
Install nginx package	`packages`
Create site config file	`webserver`
Obtain SSL certificate via certbot	`certificates`
Enable the site and reload nginx	`webserver`
Open ports 80/443 in the firewall	`firewall`

Skills: packages + webserver + certificates + firewall

Deploy a Node.js application

"Clone the repo from GitHub, install dependencies, set up a PM2 process, and configure nginx as a reverse proxy"

Step	Skill needed
Create app user and directory	`users` + `files`
Clone the Git repository	`git`
Install Node.js and npm dependencies	`webapps`
Start the app with PM2	`webapps`
Create nginx reverse proxy config	`webserver`
Set up SSL certificate	`certificates`

Skills: users + files + git + webapps + webserver + certificates

Set up a PostgreSQL database server

"Install PostgreSQL 16, create a database and user for my app, configure backups, and open port 5432 only from 10.0.0.0/24"

Step	Skill needed
Install PostgreSQL packages	`packages`
Start and enable the service	`services`
Create database and DB user	`database`
Edit pg_hba.conf for network access	`files`
Set up a pg_dump cron job	`backup`
Open port 5432 for the subnet	`firewall`

Skills: packages + services + database + files + backup + firewall

Docker Compose deployment

"Create a docker-compose.yml for my app stack, start it, and check the container logs"

Step	Skill needed
Create project directory and compose file	`files`
Start compose stack	`containers`
View container logs	`containers`
Open ports in firewall (if needed)	`firewall`

Skills: files + containers + firewall

Security hardening

"Harden SSH (disable root login, key-only auth), set up fail2ban, and configure the firewall to allow only SSH and HTTPS"

Step	Skill needed
Edit sshd_config	`security`
Restart sshd	`services`
Install and configure fail2ban	`security`
Set firewall rules (allow 22, 443 only)	`firewall`
Review auth logs	`logs`

Skills: security + services + firewall + logs

Set up WireGuard VPN

"Install WireGuard, generate keys, configure a tunnel to 10.0.1.0/24, and open UDP port 51820"

Step	Skill needed
Install WireGuard package	`packages`
Generate keys and create config	`vpn`
Enable IP forwarding (sysctl)	`network`
Open UDP 51820 in firewall	`firewall`
Start and enable the WireGuard service	`services`

Skills: packages + vpn + network + firewall + services

Skill assignment strategies

Use agent groups to assign skill sets by server role, so you don't have to configure each agent individually:

Server role	Recommended skills
Web server	system, files, services, packages, users, webserver, certificates, firewall, logs, monitoring
App server	system, files, services, packages, users, webapps, git, logs, monitoring
Database server	system, files, services, packages, database, backup, firewall, storage, logs, monitoring
Docker host	system, files, services, packages, containers, network, firewall, storage, logs, monitoring
Minimal / read-only	system, logs, monitoring (no write skills — agent can only read)

Least-privilege: Only assign the skills each server actually needs. A database server doesn't need webserver. A web server doesn't need database. Fewer skills = smaller attack surface.

Policy Rulesets

Rulesets are short markdown policy snippets attached to agents (directly or via groups). Every attached ruleset is concatenated and injected into the LLM system prompt as an unconditional POLICY RULES block — applied to every task, regardless of which skill runs.

Where management hints are advisory context scoped to a single skill ("PostgreSQL data dir is /data/pg16"), rulesets are cross-skill constraints that stay in force for the whole task ("Never restart services between 09:00 and 18:00 UTC", "Never edit files under /etc/pam.d without prior approval"). The prompt includes an explicit refusal rule: if a request would violate a listed policy, the agent refuses instead of executing.

Managing Rulesets

Go to Agent Skills → Rules. Each ruleset has:

Slug — stable identifier (e.g. change-window, pii-handling).
Name — display label.
Content — markdown policy text, capped at 4 KiB per ruleset.

Attaching Rulesets

Level	Where	Scope
Per-agent	Agent detail → Rulesets	This agent only
Per-group	Agent Groups & Sites → Groups tab → Rulesets	All agents in the group

Rulesets accumulate across attachments — an agent gets the union of everything attached directly plus everything inherited from every group it belongs to (deduplicated by ruleset id). Changes push to affected agents immediately over WebSocket; no restart required.

Permission: Managing rulesets requires the skills permission (same gate as creating custom skills).

Rulesets are guidance, not a hard sandbox. They instruct the LLM to refuse out-of-policy requests, but hard security boundaries (allowed_commands, kernel sandbox, skill operation gates) still live in the agent itself. Use rulesets for organizational policy; use skills and the kernel sandbox for enforcement that cannot be prompted around.

Agent Groups

Groups let you organize agents logically (e.g. by environment or role) and apply policy in bulk.

An agent can belong to multiple groups.
Groups can be used as MCP targets (e.g. target: "production" runs the task on all agents in the group).
You can control which team members can see which groups via user group access.
Skills assigned to a group are inherited by all agents in that group (shown as read-only "(via group)" on the agent detail page).

Create and manage groups from the Agent Groups & Sites page (Groups tab). Assign agents to groups from the agent detail page or the groups page. Sites live next to groups under the Sites tab on the same page — groups carry policy (skills, rulesets, SIEM), sites carry physical/logical location.

Group-level skill configuration

When assigning skills to a group, you can configure per-skill settings that apply to all agents in the group:

Management hints — Contextual instructions injected into the LLM prompt. Use for group-wide conventions (e.g. "All webservers use /var/www as document root. Nginx config in /etc/nginx/sites-enabled/").
LLM model override — Use a specific model for a skill across all agents in the group.

Click the chevron next to a skill in edit mode to expand the configuration panel.

Sites & Datacenters

Sites give every agent a physical or logical home — a datacenter, office, branch, or cloud region. They're a different axis from groups: groups are policy bags (skills, rulesets, SIEM, member access), sites are about where the server lives. Most teams won't create any sites and that's fine; they're optional and only worth turning on once a fleet spans multiple locations.

What sites do

Filter every list page — A small site selector appears in the sidebar (only when at least one site exists). Picking a site narrows Agents, Agent Assets, Monitors, Backups, Certificates, Pentests, Audit Logs, Compliance, Activity, and the Dashboard to that site. "All sites" turns the filter off. The selection persists per-browser.
Carry an optional Local LLM endpoint — A site can hold its own LLM API URL, model, key, engine, and num_ctx. Every agent in the site automatically inherits it — no need to configure each agent to point at the local LLM appliance.
Scope team member access — You can grant a member access to all agents in a site, mirroring how groups grant access. Combines (OR) with their direct-agent and group grants.
Work as MCP / API targets and filters — Tool calls accept a site name as target, and search tools accept site as a filter alongside group.

LLM resolution chain

The LLM endpoint an agent uses is resolved field-by-field in this priority order:

Priority	Source	When to use
1 (highest)	Per-agent override (Agent detail → Edit)	One agent needs a specific endpoint
2	Per-site override (Sites tab → expand row → LLM)	One LLM appliance close to a whole datacenter's worth of agents
3	Account default (Settings → Account)	Default for everything else
4	Local default (`http://localhost:11434`, `llama3.2`)	Nothing configured

An explicit URL anywhere in the chain bypasses Trial LLM and Proxied LLM — if you point a site at a local appliance, agents in that site go direct.

On the agent edit page the toggle reads "Inherit from site" when the agent's site has its own LLM, otherwise "Inherit from account". The "Currently:" line shows the resolved endpoint regardless of source.

Creating and managing sites

Sites are admin-only. From the Agent Groups & Sites page (Sites tab):

Click New Site and give it a name (e.g. "Frankfurt", "AWS us-east-1", "HQ Office").
Expand the row to assign agents (multi-select), grant member access (multi-select), or set the optional Local LLM.
Each row shows live counts: agent count, member count, plus per-feature badges (X mon, X bkp, X certs, X pt) when those features are enabled.
Move existing agents between sites with the Move to site… bulk action on the My Agents page.
Newly approved agents inherit the currently-selected site automatically (or stay unassigned if "All sites" is selected).
Deleting a site detaches its agents (sets site_id to NULL); nothing is lost.

Detached resources under a specific-site filter

Some resources can become "detached" when their agent is deleted — backups (S3 data survives) and certificates (the cert remains until expiry for the CRL). Under "All sites" or "None", detached items are visible. Under a specific-site filter, they hide — they don't belong to any specific site, so a site view doesn't include them. Switch back to "All sites" to see and reassign or delete them.

Site vs group at a glance

	Group	Site
Concept	Policy bag	Place
Membership	Many-to-many	One-to-one (an agent is in 0 or 1 site)
Carries skills / rulesets	Yes	No
Carries SIEM destination	Yes	No
Carries Local LLM defaults	No	Yes
Filters the sidebar selector	No	Yes

Secrets

Each agent has a local secrets.txt file (/opt/managelm/secrets.txt on Linux, C:\ManageLM\secrets.txt on Windows). This file stores sensitive values that commands might need.

# Example secrets.txt
DB_USER=myapp
DB_PASS=s3cret
API_KEY="my-api-key"

How secrets work: Values are injected as environment variables into command subprocesses. The LLM only sees variable names ($DB_PASS), never the actual values. Secrets never leave the server.

LLM Configuration

The LLM is configured from Settings → Account:

Local LLM (Recommended) — Ollama or LM Studio, running on the agent server or a dedicated LLM host accessible by your agents. Full data privacy.
Cloud LLM — External provider (Claude, ChatGPT, Gemini, Grok, Groq, Mistral, DeepSeek). Select from a dropdown, pick a model, and enter your API key. Use the Test button to validate the key.

For both Local and Cloud LLM, you can choose the access mode:

Direct — Agent calls the LLM directly (default).
Proxied — Agent routes LLM calls through the portal. The API key stays on the portal and is never exposed to agents.

Configuration hierarchy

LLM settings can be overridden at multiple levels (highest priority first):

Level	Where to set	Use case
Per-skill override	Agent detail → expand skill → LLM Model Override	Use a specific model for complex skills
Per-agent override	Agent detail → Edit → "Override for this agent"	Agent needs a different LLM (local or cloud)
Per-site override	Groups & Sites → Sites tab → expand site → "Override for this site"	One LLM appliance close to all agents in a datacenter (see Sites)
Account default	Settings → Account	Default for all agents

The per-skill config panel also includes management hints for providing contextual instructions to the LLM.

Per-agent overrides offer Local LLM or Cloud LLM options and always use direct access. Agents inherit from the account default unless explicitly overridden.

Default values if nothing is configured:

LLM API URL: http://localhost:11434 (Ollama default)
LLM Model: llama3.2 (we recommend qwen3.5:9b — see model recommendations)

Users & Roles

ManageLM supports team collaboration with role-based access control.

Roles

Role	Access
Owner	Full access. Cannot be removed. One per account.
Admin	Full access. Can invite users, manage permissions, edit settings.
Member	Limited access based on permissions. Only sees assigned agents and groups.

Member Permissions

Permission	Grants access to
`agents`	Approve, delete, configure agents and assign skills
`groups`	Create, rename, delete groups and assign agents
`skills`	Create, import, edit, and delete skills
`logs`	View task logs and MCP activity
`reports`	Run and view security audits, system inventory, SSH/sudo access scans, activity audits, compliance dashboards
`connectors`	Create and manage cloud connectors (AWS, Azure, GCP, OVH…) for resource discovery and SIEM forwarding
`monitors`	Create, edit, delete service monitors; toggle alerts; run ad-hoc tests
`certificates`	Issue, renew, revoke, and deploy TLS certificates (Internal CA + Let's Encrypt)
`backups`	Configure backups, run on-demand, restore snapshots, manage account S3 settings
`pentests`	Launch penetration tests (active attack scans against public agents)

MCP Visibility

All users (including owners and admins) only see agents via MCP that are:

Explicitly assigned to them (agent detail → Assigned Users), or
In a group they have access to (Users → user group access).

This ensures MCP access is always explicitly granted, regardless of role.

Skill Restrictions

For delegated admin members, you can optionally restrict which skills they can invoke by maintaining a per-member allowlist. On the Users & Roles page, expand the Skill Restrictions row under a member's permissions and click Edit to add skills to the member's allowlist.

Empty allowlist — no restriction; the member can invoke any skill the target agent carries.
One or more skills listed — the member may only invoke the listed skills. The skill must additionally be assigned to the target agent (directly or via a group) at invocation time.
base and any skill flagged as required are systematically allowed regardless of the allowlist — they underpin every other skill, so the picker filters them out (listing them would be redundant) and the enforcement layer always allows them as a safety net.
The allowlist applies wherever the member can launch a task — chat modal, MCP, follow-ups, and resumed tasks. The chat modal's auto skill option is hidden for restricted members because it would otherwise pick a skill at runtime and bypass the allowlist.
Owners and admins are exempt — the editor is hidden for them.
On the Agents and Agent Groups & Sites pages, out-of-allowlist skills render faded with a red restricted badge for members whose allowlist doesn't cover them. The fade is a visual hint only — admins can still edit, remove, or configure those skills as usual.
Local managelm-shell sessions on the host and portal-initiated scans (security, inventory, SSH/sudo, activity) are exempt — they aren't user-initiated skill calls.

Use permissions to gate management actions (creating agents, editing groups) and skill restrictions to gate operational ones (running sensitive skills on agents). Rejected tasks return the same error as “skill not assigned to agent”, so a restricted user can't tell whether the skill is missing or simply forbidden for them.

Site Admin Grants

For delegated members, you can optionally grant admin permissions on specific sites — on top of (not in place of) any global perms the member already holds. On the Users & Roles page, expand the Site Admin Grants row under a member and pick one or more sites.

Empty list — no extra grants; the member only has whatever global perms you ticked above.
One or more sites listed — for each listed site, the member gains the elevatable admin perms on that site, even when they don't hold the matching global perm. Global perms always apply globally, regardless of this list (it never narrows).

Which permissions can be elevated: the seven that act on individual agents — agents, monitors, backups, certificates, pentests, reports, logs. Not elevatable: groups, skills, connectors — these manage account-wide catalogs that have no per-site axis, so only the global perm can grant them.

Read access is independent of these grants. A member still sees every agent they're assigned to (directly or via group / site member access); grants only affect write/admin actions on those agents.

Cross-agent operations (a monitor or backup copied to several agents, a certificate bound to several agents) require coverage on every involved agent — either via the global perm (which covers all) or via grants on every agent's site.
Detached resources (backups or certificates whose agent has been deleted) are reachable only to global-perm holders — site grants need a real site to anchor on.
Moving an agent between sites needs coverage on both the source and target site (grant-only users can't move agents into sites they don't admin).
Owners and account admins are global by definition — they bypass every site check, and the grant editor is hidden for them.

Enforcement is uniform across the portal UI, the API, and the MCP scan tools. Action buttons, agent pickers in “New Monitor”, “New Certificate”, “New Backup”, “New Pentest”, the copy-to-agents modals, bulk action toolbars, and per-row management buttons all filter to agents the member can actually admin (global perm or matching site grant).

Inviting Users

Go to Users & Roles.
Click Invite User.
Enter their name and email, select a role, and set permissions.
They'll receive an email with an invitation link.

Passkeys & MFA

ManageLM uses WebAuthn passkeys for multi-factor authentication. The rule is simple: having a passkey is having MFA — there is no separate on/off switch.

Register a passkey from Settings → Security → Passkeys. Your first passkey activates MFA and shows your recovery codes — save them; they're shown only once.
After password login you're prompted for the passkey. Passkeys also work for passwordless login from the login page.
Register multiple passkeys (e.g. fingerprint + security key) and name them. Removing your last passkey turns MFA off and clears your recovery codes.

Require MFA for all users (account owners)

Owners can enforce MFA account-wide from Settings → Security → Passkeys & MFA with Require MFA for All Users. When on, every member must clear a second factor at login:

Members with a passkey are challenged for it.
Members without a passkey yet receive a single-use 6-digit code by email each login — a bridge so nobody is locked out — until they register a passkey, after which they use it automatically.

The same applies when connecting MCP clients: a connector login with a passkey does the passkey step; without one, the emailed-code page appears.

SSH key as MFA fallback (account owners)

Owners can also enable SSH Key as MFA Fallback in the same section. When on, any member who has registered an SSH public key may sign the login challenge with their SSH private key as a recovery alternative to their passkey (portal login only).

API Keys

API keys allow programmatic access to the portal API for automation and integrations.

Go to Settings → MCP & API.
Enter a name, select permissions (any subset of the ten member permissions: Agents, Groups, Skills, Logs, Reports, Connectors, Monitors, Certificates, Backups, Pentests), and optionally set an expiration (30, 90, 180, or 365 days).
Click Create Key and copy the key (starts with mlm_ak_). It's only shown once.

Use the key in the Authorization header:

Authorization: Bearer mlm_ak_...

Each key's effective permissions are the intersection of the key's permissions and the creating user's permissions. If a user is later downgraded, their keys lose access accordingly. Expired keys are automatically cleaned up.

OAuth App Credentials (OpenAI GPT, etc.)

For integrations that require OAuth 2.0 (like OpenAI GPT Actions), set OAUTH_APP_CLIENT_ID and OAUTH_APP_CLIENT_SECRET in your .env file. These identify the application — each user still authenticates individually with their own ManageLM credentials. See the Self-Hosted Docker guide for details.

Security Model

Defense in depth

Command allowlist — Skills define exactly which commands an agent can run. Enforced in code, not prompts.
Destructive command guard — Even for allowed commands, the agent blocks catastrophically dangerous argument combinations: rm targeting protected root directories (/, /etc, /usr, etc.), dd writing to block devices, mkfs, --no-preserve-root, and find -delete.
Kernel sandbox (opt-in, Linux only) — Landlock + seccomp-bpf confine command subprocesses at the kernel level. Even if a command passes all Python-level checks, the kernel blocks writes outside allowed paths and dangerous syscalls.
Read-only by default — Agents with no skills (or skills with empty allowlists) can only run safe read-only commands.
Outbound-only connections — Agents connect to the portal. No inbound ports needed.
Ed25519 task signing — Every task dispatched to an agent is cryptographically signed. Agents verify the signature before execution.
Secrets isolation — Secrets stay on the server. The LLM only sees variable names, never values.
Hash-only storage — Passwords, tokens, and API keys are stored as hashes.
Rate limiting — Login, registration, and password reset endpoints are rate-limited.
IP whitelist — Optional CIDR-based IP whitelist for MCP connections.
Execution limits — Max 10 LLM turns per task, 120s timeout per command, 8000 char output limit.

Always-allowed commands (read-only)

The base skill is auto-assigned to every agent and provides a broad set of read-only commands:

cat head tail less more ls tree grep egrep fgrep find locate wc sort uniq
awk sed cut tr diff comm column paste tac xargs file stat md5sum sha256sum
sha1sum readlink basename dirname realpath uname hostname whoami id uptime
date timedatectl lsb_release arch nproc getconf dmesg last lastlog w who
df du free lsblk lscpu lsmem vmstat iostat top ps pgrep lsof fuser
ip ss netstat dig nslookup host ping traceroute curl wget nc
echo printf which type test true false yes seq sleep cd pwd

Even if the base skill is somehow missing, agents fall back to a minimal safe set: cat head tail ls grep find wc sort echo printf test true false cd pwd which.

Sandbox

The sandbox is available on Linux agents only. It confines command subprocesses with three independent layers: a privilege drop (run-as user), filesystem path confinement (Landlock), and a syscall blocklist (seccomp-bpf). It is opt-in per skill and disabled by default. Each layer can be enabled independently. Windows agents do not use the sandbox.

How it works

When enabled on a skill, every command runs inside the sandbox. The agent itself stays unrestricted — only the command is confined. Commands pass through these layers in order:

Injection blocking — rejects shell tricks like $(…), backticks, eval.
Binary allowlist — the command must be on the skill's allowed_commands list.
Destructive argument guard — blocks combinations like rm -rf /, dd of=/dev/sda.
Privilege drop — the command runs as the configured system user, not as root.
Filesystem confinement (Landlock) — restricts which paths the command can read, write, and execute from.
Syscall blocklist (seccomp-bpf) — blocks dangerous kernel calls (mount, reboot, ptrace, etc.).

Run as user (privilege drop)

Drops privileges to a system user (e.g. apache, mysql, postgres) before executing each command. The agent itself runs as root to manage the host, but with run-as enabled the skill's commands inherit only the target user's permissions — so a misuse of a database-admin skill can't, for example, write to /root.

Fail-closed: if the configured user does not exist on the host, every command for that skill fails with a clear error rather than silently running as root.

Landlock (filesystem confinement)

Restricts which filesystem paths the subprocess can read, write, and execute from. Uses Linux Landlock LSM (requires kernel 5.13+).

Access	Default paths	Purpose
Read	`/` (everything)	Commands can read system state
Write	`/etc`, `/var`, `/tmp`	Config edits, logs, temp files
Execute	`/` (everything)	`allowed_commands` is the binary gate

Everything outside the configured write paths is read-only at the kernel level — no userspace bypass possible. File uploads also enforce write paths via Python-level path validation using the same config.

seccomp-bpf (syscall filtering)

Blocks dangerous syscalls that no legitimate agent task should need. Returns EPERM (not kill) for graceful error handling.

Category	Blocked syscalls
Filesystem root	`mount`, `umount2`, `pivot_root`, `chroot`, `move_mount`, `fsopen`, `fsconfig`, `fsmount`, `fspick`, `open_tree`
System control	`reboot`, `kexec_load`, `kexec_file_load`
Kernel modules	`init_module`, `finit_module`, `delete_module`
Swap	`swapon`, `swapoff`
Exploit primitives	`ptrace`, `bpf`, `userfaultfd`, `perf_event_open`
System identity	`settimeofday`, `clock_settime`, `sethostname`, `setdomainname`

Enabling the sandbox

Open the Skills page and edit a skill.
Go to the Sandbox tab.
Toggle any combination of Run as user, Landlock, and seccomp-bpf.
Fill in the system user, write paths, or blocked syscalls as needed.
Click Save.

The sandbox is pushed to agents automatically on save. Agents on kernels older than 5.13 (Landlock) or 3.17 (seccomp) gracefully degrade — those layers are skipped with a log warning. Run-as requires the agent to run as root (the default).

Skill configuration (JSON)

{
  "sandbox_run_as": "apache",
  "sandbox_landlock": {
    "read_paths": ["/"],
    "write_paths": ["/etc", "/var", "/tmp", "/opt/myapp"],
    "exec_paths": ["/"]
  },
  "sandbox_seccomp": ["mount", "reboot", "ptrace", "init_module", "..."]
}

Each key is independent — use any combination. Absent key means that layer is off. Catalog skills include recommended sandbox templates that you can use as a starting point.

Requirements

Run as user: agent must run as root (systemd default); the target user must exist on the host.
Landlock: Linux 5.13+ with Landlock LSM enabled (default on most modern distros).
seccomp-bpf: Linux 3.17+ with CONFIG_SECCOMP_FILTER=y (enabled by default).
Architecture: x86_64 and aarch64 supported.
No kernel config needed — no kernel modules and no extra packages.

Security Audits

ManageLM includes a built-in security audit and compliance engine that scans your agents for misconfigurations, vulnerabilities, and hardening issues. Audits run read-only on the agent and are fully deterministic — no LLM required.

How it works

Trigger — From the Agent Assets page (per agent), the Compliance dashboard (fleet-wide), or via MCP.
Scan — The agent runs a set of read-only checks on the host.
Report — Each finding includes a severity, an explanation of the risk, a suggested fix, and a mapping to compliance frameworks (CIS, PCI DSS, HIPAA, ISO 27001, NIS2, NIST CSF, SOC 2). A compliance score (0–100) reflects the overall posture. Installed packages are also matched against known vulnerabilities (see below).
Results — Findings appear in the Agent Assets audit view and the Compliance dashboard. You receive an in-app notification when the audit completes.

Server context

Each compliance rule has separate severity ratings for public and private servers:

Public (internet-facing) — stricter ratings. SSH root login = critical, missing firewall = critical.
Private (internal network) — relaxed ratings. SSH root login = low, missing firewall = low, missing SELinux = low.

What is checked

Check	What it inspects
SSH & RDP config	Root login, password vs. key authentication, retry limits, X11 forwarding, RDP Network Level Authentication.
Listening ports	Open TCP and UDP sockets on all interfaces.
Firewall	Host firewall status and rules (UFW, firewalld, nftables, iptables, or Windows Firewall profiles).
User accounts	Login-enabled users, UID 0 / local administrators, guest account, service accounts.
Password policy	Minimum length, complexity, lockout threshold.
Windows hardening	UAC enabled, cleartext credential storage disabled (WDigest), automatic login disabled.
File permissions	World-writable files, SUID binaries, shadow file readability.
Password hashing	Password hashes flagged if they use weak algorithms (MD5 or older).
Patch posture	Pending security updates, automatic-update service enabled, pending reboot after kernel or library updates.
Installed packages	Full package inventory feeding the vulnerability scan.
Authentication events	Failed login attempts in the last 24 hours.
Audit & event logging	Audit daemon (Linux) or Advanced Audit Policy (Windows); PowerShell script-block logging.
Endpoint protection	Mandatory access control (SELinux / AppArmor) or Windows Defender antivirus including signature freshness.
Time synchronization	System clock synchronized via NTP.
Kernel hardening	IP forwarding, ICMP redirect handling, reverse-path filtering, ASLR, SUID core dumps.
Brute-force protection	Fail2ban status and active jails.
TLS/SSL	Weak protocols (SSLv3, TLSv1.0/1.1) and weak ciphers (RC4, DES, NULL, EXPORT, MD5) rejected on all listening services.
Certificates	TLS certificate expiry with days remaining.
SMB hardening	SMB signing required, legacy SMB1 protocol disabled.
Network exposure	LLMNR (legacy name resolution) disabled on Windows.
Disk encryption	BitLocker protection on OS and fixed data volumes (Windows).
Scheduled tasks	System and per-user cron jobs.
SSH authorized keys	SSH key-based access across all users.
Docker	Privileged containers, socket exposure, containers running as root.
Vulnerability scan	Installed packages matched against known CVEs (see next section).

Vulnerability scanning

As part of every security audit, ManageLM checks each agent's installed packages against a public vulnerability database and reports any known CVEs that apply to the installed versions. Nothing to install, nothing to configure.

Coverage — All major Linux distributions (Debian, Ubuntu, Red Hat, Rocky, AlmaLinux, SUSE, openSUSE, Alpine, and others) plus language package managers (Python, npm, Go, Rust, Ruby, Java, .NET, PHP).
Actively exploited — Vulnerabilities listed in CISA's Known Exploited Vulnerabilities catalog are automatically marked Critical and flagged as "KEV — actively exploited".
Compliance impact — CVE findings contribute to the agent's compliance score and feed the patch-management controls of CIS, NIST CSF, NIS2, SOC 2, ISO 27001, and PCI DSS.
Fix suggestions — Each finding includes the package name, installed version, CVE ID, a link to the advisory, and the exact upgrade command for the host's package manager.

Severity levels

Level	Meaning
Critical	Immediate action required — actively exploitable or dangerous misconfiguration.
High	Significant risk — should be addressed promptly.
Medium	Moderate risk — recommended to fix.
Low	Minor issue or informational finding.
Pass	Check passed — no issue found.

Findings

Each finding includes:

Category — The area being checked (e.g. SSH, firewall, filesystem, users).
Title — A short description of the check.
Explanation — Details on what was found and why it matters.
Remediation — A recommended fix for the issue.

Automated remediation

You can select one or more findings and click Remediate to have the agent automatically fix them. This requires:

The Security & Hardening skill to be assigned to the agent.
The agent to be online.

Remediation creates a task that uses the security skill and the agent's LLM to intelligently apply the recommended fixes. The agent backs up configuration files before making changes and validates them before restarting services.

Review before remediating. Always review the recommended fixes before clicking Remediate. Security changes (e.g. SSH hardening, firewall rules) can lock you out if applied incorrectly.

PDF export

Click the Security button at the top of the Agent Assets page to download a fleet-wide security audit report. The PDF includes a summary bar with issue counts by severity, detailed findings with explanations and remediation steps, and a list of passed checks.

Use the Schedules popover in the Agent Assets toolbar to enable automatic report emails (Daily / Weekly / Monthly). Scheduled reports are generated and emailed as PDF attachments to all admin users who have report_ready notifications enabled. Changing the report schedule also sets the same scan schedule on all agents so data stays fresh.

Scheduled audits

You can configure automatic recurring audits per agent. Open the Security Audit modal and use the schedule selector in the top-right corner to choose a frequency:

Manual only — No automatic scans (default).
Daily — Runs once every 24 hours.
Weekly — Runs once every 7 days.
Monthly — Runs once every 30 days.

The scheduler checks every 15 minutes and triggers audits for agents that are overdue. Agents that have never been scanned are prioritized. A yellow badge (D, W, or M) appears on the agent card to indicate an active schedule.

Constraints

Only one audit can run per agent at a time.
Each agent stores its latest audit result. Previous results are archived to history (max one per day, configurable retention via AUDIT_HISTORY_RETENTION_DAYS).
The agent must be online to start an audit (manual or scheduled).
The agents permission is required to start audits, trigger remediation, and change the schedule. All authenticated users can view results.

Service Monitors

Monitor the availability and response time of services running on your agents. Monitors run directly from the agent's network, so they can check internal services (localhost, LAN) as well as public endpoints.

How it works

Create — Open the Monitors page and click Add Monitor. Pick a type from the catalog (48 types across 10 categories), select an agent, and configure the check parameters.
Check — The agent runs the check locally on the configured schedule (1m, 5m, 15m, 30m, or 1h). Two kinds of check are supported: network probes (TCP, UDP, HTTP, DNS) and resource checks (filesystem, memory, CPU, process, custom command).
Report — The agent only sends results to the portal on status transitions (up→down, down→up) and as periodic summaries, not on every check.
Alert — When alerts are enabled, an email is sent to all users assigned to the target agent after a configurable number of consecutive failures (default: 3). A recovery email is sent when the service comes back up. Wording adapts per flavor: network monitors talk about a service being “down”; resource monitors talk about a “critical condition”.

Service catalog

The monitor catalog defines 48 types organized in 10 categories:

Category	Services
Web	HTTP / HTTPS, REST API, HAProxy, Squid Proxy
Network	TCP Port, Ping (ICMP), DNS, NTP
Email	SMTP, IMAP, POP3
Database	MySQL / MariaDB, PostgreSQL, SQL Server, Redis / Valkey, MongoDB, Elasticsearch, Memcached, ClickHouse, InfluxDB, Cassandra, CouchDB
Messaging	RabbitMQ, Kafka, NATS, MQTT
File Sharing	FTP / SFTP, SMB / CIFS, NFS, AFP, MinIO / S3, WebDAV
Remote Access	SSH, RDP, WinRM, OpenVPN, IPsec / IKEv2
Infrastructure	LDAP / LDAPS, Kerberos, Docker API, Consul, Vault, etcd
Monitoring	Prometheus, Grafana, Zabbix
System	Filesystem Usage, Memory Usage, CPU Usage, Process Running, Command / Script

Each entry maps to one of the agent's check types: tcp, udp, http, dns (network), or filesystem, memory, cpu, process, command (resource). TCP and HTTP support an SSL/TLS toggle for TLS handshake validation and optional certificate expiry warnings (works with self-signed certificates). Resource checks use warning/critical thresholds (e.g. 80 %/90 % memory) instead of binary up/down.

Custom command / script monitor

The Command / Script type runs an arbitrary shell command on the agent and maps the result to a monitor status. It exists for the cases the built-in checks don't cover — application-specific health endpoints, SaaS API probes, custom Python or Bash check scripts, plugins from your existing monitoring stack, etc.

Convention — Nagios-compatible exit codes:

Exit code	Status	Meaning
`0`	OK (up)	Service is healthy
`1`	Warning (degraded)	Service is degraded but functional
`2`	Critical (down)	Service is in a critical state
other	Critical (down)	Treated as critical so a misbehaving script never silently reports OK

Output format: stdout up to the first | is the user-facing detail, shown in the test modal and the “Last error” column when the status isn't OK. Anything after | is parsed as Nagios-style perfdata label=value[unit] ...; the first numeric value goes into the metric column, so charts work out of the box. Stderr is appended to stdout when it adds information.

Example — an existing Nagios plugin:

check_disk -w 80% -c 90% /
# stdout: DISK OK - free space: / 412 GB (84% inode=99%);| /=87GB;90;100;0;512
# exit:   0
# → status = up, value = 87 (the first perfdata number)

Example — a custom one-liner:

curl -fsS http://localhost:8080/healthz | grep -q '"ok":true' && echo "API healthy" || (echo "API failing" && exit 2)

Configuration:

Command — the shell command line. Pipes, redirects, wildcards, and environment variables work as expected (the command runs through the platform's default shell).
Timeout — hard-kill timeout in seconds (default 30, max 300). The agent kills the process and reports “Command timed out” if the command doesn't finish in time, so a misbehaving script can't park the schedule loop forever.

Security note — this is unrestricted execution. The command runs as the agent process (typically root on Linux, SYSTEM on Windows). There is no skill gate — whoever has the monitors permission can have the agent run anything the agent can run. Treat command monitors with the same care you'd treat task execution: only grant monitors to operators you'd trust to open a shell on the host.

Alerts

Each monitor has an alert toggle and a configurable consecutive failure threshold (default: 3).

Down alert — Sent when the monitor reaches the failure threshold. Emails all users assigned to the target agent (direct access + group access + admins/owners). In-app notification and webhook (monitor.down) also fired.
Recovery alert — Sent when the monitor comes back up after being down. Same recipients. Webhook: monitor.up.
Manual refresh — The Refresh button triggers immediate checks on all monitors but does not fire alerts (prevents false alerts from manual testing).

Test before creating

The Test button in the create/edit modal sends an ad-hoc check to the agent and shows the result immediately (up/down, response time, error) without creating or saving the monitor.

Data & charts

Response time sparkline — Each monitor in the list shows a mini chart of recent response times (from hourly rollup data).
Detail modal — Click a monitor to see uptime percentages (24h, 7d, 30d), a full response time chart, and the status change timeline.
Infrastructure badges — Agent cards in the Agent Assets page show a monitor status badge (e.g. “3/3 up” or “1 down”).

Permissions

All authenticated users can view monitors, statuses, charts, and history.
The monitors permission (or admin/owner role) is required to create, edit, delete monitors, and toggle alerts.
The permission toggle appears in Users & Roles under “Admin permissions”.

MCP integration

One MCP tool is available for AI assistants:

search_monitors — List all monitors (no args) or filter by status (down), service type (mysql), agent name, site, or free text.

Per-Plan Limits

The number of monitors per account is limited by your plan (Free: 20, Pro: 100, Business: 200, Enterprise: unlimited). The Monitors page shows your usage against the limit. The Add Monitor button is disabled when the limit is reached.

Certificates & PKI

Manage TLS certificates for your agents directly from the portal. Two certificate sources are supported:

Internal CA — Create or import an RSA-4096 Certificate Authority. Issue leaf certificates (ECDSA P-256, RSA-2048, or RSA-4096) signed by your CA. A CRL is automatically generated and served at a public URL.
Let's Encrypt — Register an ACME account and issue free, publicly-trusted certificates. Two challenge types are available:
- HTTP-01 (default) — The agent handles the challenge automatically on port 80. Requirement: the agent must be reachable on inbound TCP port 80 from the internet.
- DNS-01 — The portal creates a DNS TXT record via your configured DNS provider. Works with any agent (public or private) and supports wildcard certificates (*.example.com). Configure DNS providers in Settings → PKI & CA → DNS-01 Providers. Supported providers: Cloudflare, DigitalOcean, Hetzner DNS, OVH.

Setup

Configure a CA or LE account — Go to Settings → PKI & CA. Create a new internal CA, import an existing sub-CA, or register a Let's Encrypt account. Optionally add DNS-01 providers for DNS-based certificate validation.
Set defaults — Configure default certificate validity (14–365 days), key type (ECDSA P-256, RSA-2048, RSA-4096), and renewal window (7–90 days before expiry).
Issue certificates — Go to Certificates, click New Certificate, pick a target agent, and fill in the common name, file paths, and optional SANs.

Certificate Lifecycle

Issue — The agent generates a keypair and CSR locally — the private key never leaves the agent. The portal validates the CSR, signs it with the internal CA (or submits it to Let's Encrypt via ACME), and sends only the signed certificate back to the agent over WebSocket. The agent writes the cert and key to disk and reloads the target service.
Renew — Manual via the Renew button, or automatic via the daily renewal sweep. Renewal generates a fresh keypair and CSR on the agent, signs a new certificate, deploys it, then revokes the old one. For LE certs, the old certificate is also revoked on Let's Encrypt's side.
Revoke — Marks the certificate as revoked and updates the CRL. For LE certs, the revocation is also sent to Let's Encrypt's ACME endpoint. Revoked internal CA certs can be reactivated; LE revocations are permanent.
Delete — Soft-deletes the certificate (must be revoked, expired, or failed first). The serial stays in the CRL until the certificate's natural expiry date, then is purged by the daily sweep.

CRL & Public Endpoints

The portal serves two public endpoints (no authentication required):

/pki/<crl_id>.crl — DER-encoded CRL signed by the internal CA, served with a 7-day validity.
/pki/<crl_id>.cer — DER-encoded CA certificate for trust chain installation.

Both URLs are embedded in issued certificates as the CRL Distribution Point and Authority Information Access extensions.

Auto-Renewal Sweep

A daily background task handles certificate lifecycle:

Expires certificates whose not_after has passed.
Renews active certificates that are within the renewal window and whose agent is online.
Purges soft-deleted certificates whose serial has naturally expired, and stale failed/pending rows older than 7 days.
Sends notifications (in-app, email, webhook) on renewal success or failure when alerts are enabled.

Permissions

CA and LE account management is restricted to the account owner.
The certificates permission (or admin/owner role) is required to issue, renew, revoke, and delete certificates.
All authenticated users can view certificates, their status, and deployment details.

Per-Plan Limits

The number of certificates per account is limited by your plan (Free: 10, Pro: 50, Business: 100, Enterprise: unlimited). The Certificates page shows your usage against the limit. The New Certificate button is disabled when the limit is reached.

MCP Tools

search_certificates — List all certificates (no args) or filter by status, source, agent name, site, or free text.

System Backups

End-to-end encrypted filesystem backups from your agents to your own S3 storage. ManageLM never sees your data — the agent encrypts every archive locally before uploading, and only the ciphertext transits via your S3 bucket. Restore to any online agent at any time.

Providers

The S3 bucket is configured once per account in Settings → S3 Backups. Provider-agnostic — one set of credentials, any S3-compatible storage:

OVH Object Storage (recommended for EU-based data residency)
Amazon S3
Cloudflare R2 (no egress fees)
Backblaze B2
Wasabi
Scaleway Object Storage
MinIO (self-hosted S3)

Each provider has a one-click preset that pre-fills the endpoint URL. The Test button validates credentials via HeadBucket before saving. Secret keys are stored AES-256-GCM encrypted at rest.

Encryption

Every backup has its own randomly generated 32-byte master key, stored wrapped server-side. Before each run, the portal sends the key to the agent over the existing mTLS WebSocket channel — never over HTTP, never logged.

AES-256-CBC + HMAC-SHA256 — encrypt-then-MAC pattern, industry standard.
Domain-separated subkeys — encryption key = SHA256(master || "enc"), MAC key = SHA256(master || "mac").
Wire format — IV (16 bytes) || ciphertext || HMAC tag (32 bytes).
Restore-time verification — the HMAC is checked at end-of-file using a rolling-window stream; mismatch destroys the download stream so the browser sees a broken file instead of partial/corrupt bytes.

Pure-Python implementation on the agent via oscrypto — no cryptography package, no native build dependencies.

Schedule & Retention

Each backup has its own cadence and retention:

Schedule	Configurable Fields
Every hour	—
Every 6 hours	—
Daily	Run time (HH:MM, agent-local)
Weekly	Day of week + run time
Monthly	Day of month (1–31, clamped) + run time

FIFO retention — specify how many snapshots to keep (1–90). Older snapshots are automatically rotated out by the cleanup cron, which best-effort deletes the S3 object then the DB row.

Quiesce services during backup

For a consistent snapshot of databases and stateful apps, list one or more services to stop during the backup (comma-separated, e.g. postgresql, redis). The agent:

Stops each listed service via systemctl stop (Linux) or net stop (Windows) — 30-second timeout per service.
Runs the tar → encrypt → upload pipeline.
Restarts every service that was successfully stopped — in a try/finally so a backup failure (or the agent being killed mid-run) never leaves services down.

Run flow

Agent requests a presigned PUT URL from the portal; portal pre-inserts a pending snapshot row.
Agent tars the source path (with optional excludes), encrypts the archive, uploads directly to S3 — never through the portal.
Agent reports size, file count, duration, SHA-256 via backup_status.
Portal flips the snapshot to ok / failed; the cleanup cron reaps stuck pending rows after 6 hours.

Download & Restore

Download decrypted .tar.gz — mints a one-shot Redis token (5 min TTL) then triggers a native browser download. The portal streams S3 bytes through a rolling-window HMAC/decrypt pipeline; the save dialog opens immediately and the progress bar fills as the archive arrives.
Restore to any agent — pick a target agent and target path in the Restore modal. The agent downloads, verifies the HMAC before touching disk, strips the top-level source directory, and extracts with a path-traversal guard. You can restore to the original agent or any other online agent — the backup is portable.
Abort a running backup — cancel a stuck upload and reclaim the snapshot slot.

Detach on agent delete

When you delete an agent that has backups, the backups are not deleted — their agent_id is cleared instead. The S3 data and snapshot history survive the hardware replacement. A purple Reassign button appears in the backup row; clicking it opens the edit modal with an Agent picker so you can attach the backup to a new agent and continue the schedule. The UI also warns you about the detached count before confirming the agent deletion.

S3 orphan cleanup

The S3 Cleanup button in Settings → S3 Backups scans your bucket under the account prefix and deletes objects that have no matching snapshot row in the portal. Useful when the bucket was deleted externally, credentials were rotated mid-run, or you want to reclaim storage after manually removing backups.

Alerting

Per-backup toggle for alert-on-failure emails. ManageLM also detects stalled backups: if a scheduled backup is missed because its agent is offline, you receive a single consolidated alert per agent rather than one alert per missed run.

Permissions

Read access — all authenticated users can view the Backups page, snapshot counts, and last-run status.
Backups Admin (perm_backups) — required to create, edit, delete, run-now, reassign detached backups, download decrypted snapshots, restore, and configure account S3 settings.
Owners and admins bypass the permission check.

Per-Plan Limits

The number of backups per account is limited by your plan (Free: 20, Pro: 100, Business: 200, Enterprise: unlimited). Detached backups still occupy a slot — delete them explicitly to free the slot.

Constraints

Each snapshot is held fully in memory during encrypt/decrypt on the agent (oscrypto one-shot API). Practical upper bound: 4 GB archive size. For larger datasets, split into smaller backups.
You must configure S3 storage before creating your first backup.
Restoring to a detached backup requires explicitly picking a target agent.

Pentests

ManageLM includes automated penetration testing for your public-facing agents. Pentests scan your servers from the outside — testing what an attacker would see. Available on Pro and Business plans.

How it works

Select — Open the Pentests page and click New Pentest. Choose one or more public agents, select the tests to run, and optionally add target URLs.
Validate — The portal sends a one-time token to the agent. The agent validates with the pentest service from its public IP, proving it controls the target.
Scan — The pentest service runs tools sequentially: nmap (port discovery), nuclei (vulnerability scanning), testssl.sh (TLS audit), and more depending on selected tests.
Report — An LLM generates a human-readable report with findings, severity ratings, and a security score (0–100). Results appear in the Agent Assets audit modal (Pentest tab) and the Pentests dashboard.

Available tests

Test	What it scans	Credits
Basic Scan	Port discovery (nmap), vulnerability scan (nuclei), TLS quick check (testssl)	3
Full Port Scan	All 65,535 TCP ports	3
Vulnerability Scan	Extended nuclei templates (critical/high/medium)	3
SSL/TLS Audit	Full testssl.sh analysis (per URL)	1
Web App Scan	Nuclei web templates (per URL)	3
DNS Audit	SPF, DMARC, DKIM, MX records (per URL)	1
HTTP Headers	Security headers analysis (per URL)	1
Directory Scan	Common path discovery with ffuf (per URL)	2
Subdomain Enum	Subdomain discovery with subfinder (per URL)	1

URL-based tests run once per target URL. Credit cost is calculated as: IP-based test credits + (URL-based test credits × number of URLs).

Credits

Pentests consume credits. Credits are deducted after a successful scan — failed scans are not charged.

Bundled credits — Pro and Business plans include credits when you first subscribe.
Purchase more — Click Add Credits in the Pentests page or Settings > Account to buy additional credit packs.
Balance — Your remaining credits are shown in the Pentests dashboard and in Settings > Account.

Domain verification

Before scanning URLs, you must verify domain ownership. The pentest service generates a DNS TXT record that you add to your domain. Once verified, the domain stays valid for 24 hours before requiring re-verification.

Compliance integration

Pentest results automatically feed into the Compliance page. Each tool produces a pass/fail rule that maps to framework controls (CIS, PCI-DSS, SOC 2, ISO 27001, NIS2, NIST CSF, HIPAA). Pentest rules appear alongside security audit rules in framework coverage views.

Constraints

Only public agents (internet-facing) can be pentested — the service scans from the outside.
One pentest per agent at a time.
Target URLs must DNS-resolve to the agent's public IP.
The agent must be online to validate the scan token.
Requires a Pro or Business plan with sufficient credits.

Compliance & Frameworks

The Compliance page maps your security audit results to industry compliance frameworks. ManageLM evaluates your fleet against each framework's controls and shows which pass, fail, or are not covered by the current rule set.

Supported frameworks

Framework	Version	Description
CIS Level 1	v8.0	Center for Internet Security — essential security hygiene for servers
CIS Docker	v1.6	CIS Docker Benchmark — container runtime security
SOC 2	2017	Trust Services Criteria — Security principle technical controls
PCI DSS	v4.0	Payment Card Industry Data Security Standard
ISO 27001	2022	ISO/IEC 27001 Annex A — information security controls
NIS2 Directive	2022	EU Directive 2022/2555 — network and information security measures
NIST CSF	v2.0	NIST Cybersecurity Framework — Protect, Detect, Identify functions
HIPAA Security Rule	2013	45 CFR §164.312 — technical safeguards for protected health information

How controls are evaluated

Each framework control is backed by one or more checks from security audits, pentests, and vulnerability scans. A control passes only when every backing check passes on every agent. If any check fails on any agent, the control fails. Controls with no data yet (no agents scanned) show as not covered.

Compliance dashboard

The Compliance page has two tabs:

Agents tab

Fleet score — Average compliance score across all agents, with trend charts.
Issue breakdown — Critical, high, medium, low counts with stacked area chart over time.
Drift detection — Rules that changed from pass to fail between scans (shown as an alert at the top).
Per-agent detail — Expand any agent row to see score history, rule results by category, raw check output, and a history slider to compare past states.
Re-scan — Trigger audits per agent or fleet-wide with the Scan All button (requires reports permission).

Frameworks tab

Framework list — All frameworks with compliance percentage, progress bar, and icon badge.
Expand a framework — Shows each control with pass/fail status, reference number, description, and website link.
Expand a control — Shows the underlying technical checks with fleet-wide pass/fail counts.
Evidence PDF — Download button on each framework (enabled at ≥ 50% compliance).

Security drift notifications

When a security audit completes and a rule that previously passed now fails, ManageLM detects this as drift. Drift is shown in the Compliance dashboard as an alert. Optionally, admins can enable the Security Drift email notification in Settings > Email Notifications to receive an email with the new issues.

Drift detection only triggers when there is audit history — the first scan for an agent never generates drift alerts.

Evidence PDF export

Each framework has an Evidence PDF button (enabled when compliance is ≥ 50%). The generated PDF is designed for auditors and includes:

Cover page — Framework name, version, compliance percentage, agent count.
Scope & methodology — Assessment date, server count, evaluation method, reference URL.
Control evidence — Each control as a card with status badge, description, technical checks table (pass/fail per check with fleet-wide counts), and for failing controls: per-agent findings with remediation guidance and raw command output as evidence.
Assessed infrastructure — Table of all servers with compliance score, exposure level (public/private), and last audit date.
Disclaimer — Technical controls only, not a compliance certification.

The fleet-wide Export PDF button on the Compliance page generates a summary report covering all frameworks.

Adding custom frameworks

Self-hosted operators can add a custom framework by dropping a JSON file into the install's frameworks/ directory and restarting the portal — the file lists which existing rule slugs map to each control.

System Inventory

ManageLM discovers all running services, installed packages, and system components on your agents. Checks are read-only and no skill assignment is required.

How it works

Trigger — Open an agent's detail panel on the Agent Assets page. Click the clipboard icon to open the System Inventory modal, then click Run Inventory.
Scan — The agent collects information about the system using a read-only set of checks.
Structure — The agent's configured LLM categorizes the results and extracts product names and versions. This is the only built-in report that uses the LLM. Without an LLM, a minimal inventory is returned.
Results — Inventory items appear in the modal, grouped by category.

What is collected

Check	What it inspects
System Info	OS, kernel, uptime, CPU count, memory, disk usage
Running Services	All active services (systemd on Linux, Windows Services on Windows)
Enabled Services	Services enabled at boot
Listening Ports	TCP listening sockets with associated processes
Installed Packages	Package list from rpm or dpkg (Linux), or installed programs list (Windows)
Package Versions	Explicit version extraction for common packages (nginx, PostgreSQL, Redis, Docker, etc.)
Containers	Docker/Podman containers with image, status, and ports
Cron Jobs	System and per-user cron jobs
Network Interfaces	All network interfaces with addresses
Mounted Filesystems	Non-virtual mounted filesystems
Hardware Info	CPU model, memory, disks
Web Servers	Running web servers (nginx, Apache, Caddy, HAProxy)
Databases	Running databases (PostgreSQL, MySQL, MongoDB, Redis, Valkey, Memcached, Elasticsearch)
Login Users	Non-system user accounts with shell and group membership

Category	Examples
`system`	OS version, kernel, CPU, memory, disk
`web`	Nginx, Apache, Caddy, HAProxy
`database`	PostgreSQL, MySQL, Redis, Valkey, MongoDB, Elasticsearch
`mail`	Postfix, Dovecot, OpenDKIM
`container`	Docker containers, Podman containers
`network`	Network interfaces, listening ports
`storage`	Mounted filesystems, disks
`security`	Fail2ban, SELinux, firewall
`monitoring`	Monitoring agents, metrics collectors
`log`	Rsyslog, journald, logrotate
`user`	Login user accounts
`scheduler`	Cron jobs, systemd timers

PDF export

Click the Inventory button at the top of the Agent Assets page to download a fleet-wide inventory report covering all agents with completed inventories. The PDF includes categorized service lists with versions and status for each server.

Like security reports, use the Schedules popover to enable automatic email delivery. Changing the schedule also syncs all agents' inventory scan schedule.

Scheduled inventories

You can configure automatic recurring inventories per agent. Open the System Inventory modal and use the schedule selector in the top-right corner to choose a frequency:

Manual only — No automatic scans (default).
Daily — Runs once every 24 hours.
Weekly — Runs once every 7 days.
Monthly — Runs once every 30 days.

The scheduler checks every 15 minutes and triggers inventories for agents that are overdue. A yellow badge (D, W, or M) appears on the agent card to indicate an active schedule.

Constraints

Only one inventory can run per agent at a time.
Each agent stores only its latest inventory result (previous results are overwritten).
The agent must be online to start an inventory (manual or scheduled).
The agents permission is required to start inventories and change the schedule. All authenticated users can view results.

SSH & Sudo Access

ManageLM includes a built-in access scanner that discovers SSH authorized keys and sudo privileges across your infrastructure. Checks are read-only and run on the agent — fully deterministic, no LLM involved. Discovered SSH key fingerprints are matched against ManageLM user profiles for identity resolution.

How it works

Trigger — Open an agent's detail panel on the Agent Assets page. Click the SSH & Sudo button to open the access scan modal, then click Scan Access.
Collect — The agent enumerates each user's authorized SSH keys (with SHA256 fingerprints) and parses sudoers files, including group-based rules. No LLM is involved.
Results — The combined data is returned to the portal and displayed in the modal. SSH key fingerprints are matched against public keys registered in ManageLM user profiles (Settings → Security → SSH Public Keys) — matched keys show the user's name in a green badge, unmatched keys show as "Unknown".

What is collected

Data	Source	Details
SSH authorized keys	`~/.ssh/authorized_keys`	Key type, SHA256 fingerprint, comment, full public key, line number
Sudo user rules	`/etc/sudoers`	Target host, runas user, commands, NOPASSWD flag, source file
Sudo group rules	`/etc/sudoers` + `/etc/group`	Group rules (e.g. `%wheel`) expanded to individual users via group membership

Identity mapping

ManageLM users can register their SSH public keys in Settings → Security → SSH Public Keys. When the access scan discovers a key on a server, its SHA256 fingerprint is matched against registered keys to identify the owner. This creates a complete map of who has access to what and what they can do (SSH + sudo).

Register your SSH keys. For identity resolution to work, each team member should add their SSH public key(s) in Settings → Security → SSH Public Keys. Without registered keys, all discovered keys will appear as “Unknown” in scan results.

Green badge — Key matched to a ManageLM user profile.
Gray "Unknown" badge — Key not registered by any ManageLM user.

Sudo rules with NOPASSWD are highlighted in red as a security concern.

Key comments are not used for identity. The user@host comment in authorized_keys is unreliable — identity is resolved exclusively via SHA256 fingerprint matching against registered profiles.

MCP integration

The access scan powers natural-language access management via Claude:

search_ssh_keys — Search SSH keys across your infrastructure. Combines two data sources: registered keys from ManageLM user profiles (with full public key content) and deployed keys found by access scans on servers. Examples: "Who has SSH access to pocmail?", "Get Charly's SSH key", "List unknown SSH keys".
search_sudo_rules — Search sudo privileges from access scan results. Examples: "Show me Charly's sudo authorizations", "List all NOPASSWD sudo rules on production".
list_team_members — List ManageLM users with their roles, permissions, and registered SSH public keys. Used to look up a team member's key before granting access.
run_access_scan — Trigger a fresh scan and wait for results.
Combined with the users skill (ssh_key + sudo operations): "Give Charly SSH access to pocmail", "Add authorization for Charly to reboot all production servers", "Remove all access for Yoann". Claude automatically looks up the team member's registered SSH key via search_ssh_keys before dispatching the task.

PDF export

Click the SSH & Sudo button at the top of the Agent Assets page to download a fleet-wide access report. The PDF includes SSH keys and sudo rules per user per server, with NOPASSWD rules highlighted.

Scheduled scans

Configure automatic recurring scans per agent via the schedule selector in the modal header, or for all agents via the Schedules popover in the Agent Assets toolbar. Frequencies: Manual / Daily / Weekly / Monthly.

Constraints

Only one scan can run per agent at a time.
Each agent stores only its latest scan result (previous results are overwritten).
The agent must be online to start a scan.
The reports permission is required to start scans and change schedules. All authenticated users can view results.

SSH Key & Sudo Sync

Push a user's stored SSH public keys to managed Linux and Windows hosts so they can log in directly with their keys — no manual editing of ~/.ssh/authorized_keys (or administrators_authorized_keys on Windows) on each server. On Linux only, the same opt-in can also grant passwordless sudo to root for the user's mapped local account. Sync is event-driven: every relevant change (toggle flip, new key uploaded, user removed, group membership change) immediately updates the affected agents.

How It Works

Two layers of consent. One master switch turns the host on; per-user grants then decide what each user gets:

Master switch — "Sync SSH Keys" on the agent (and on the group). If the switch is on at the agent or on any group it belongs to, sync is active — most-permissive wins. With every switch off, the agent removes its managed entries from the host (keys block and sudoers file).
Per-user grants set on each direct assignment and on each group membership; if either grants a permission, the user gets it. Two independent toggles per (agent, user):
- SSH Root — also push the user's keys to /root/.ssh/authorized_keys.
- Sudo Root — grant the user passwordless sudo to root via a managed /etc/sudoers.d/managelm drop-in.
The two are orthogonal: a user can have direct root-key access, sudo-to-root, both, or neither.

User-account key push is implicit: with the master switch on, every assigned user with a System Username set has their keys pushed to that local account on the host. There is no per-user "SSH User" toggle — the System Username field is itself the participation flag, since admin-set targeting is what makes the push safe.

Local-Account Matching: System Username

To participate at all, a user must have a System Username set on their profile (e.g. charly). This is the local account name on managed hosts. On Linux, LDAP / SSSD / NIS-managed accounts are visible alongside /etc/passwd. On Windows, local accounts are supported (domain-only accounts are not yet).

The field is admin-set: regular members can't change their own System Username (or anyone else's), but admins and owners can edit theirs and others'. This prevents a member from redirecting their own keys onto another local account by relabeling themselves. Configure it in Users & Roles → (member) → System Username, or at invite time via the optional field on the invite modal.

SSH Root and Sudo Root grants (Linux only) both require System Username to be set — granting either to a user not mapped to a real local account would decouple the grant from any identity record. The toggles are disabled in the UI with an explanation when System Username is missing.

Where to Toggle

Agent edit page → Configuration → Sync SSH Keys master switch (above Auto Update). Available on both Linux and Windows agents. Greyed with a (via group: <name>) tooltip when inherited from a group — flip it on the group page to change inherited state.
Agent edit page → Assigned Users → SSH Root and Sudo Root switches per row (Linux agents only). Independent of each other. Disabled when the user has no System Username, or when inherited from a group.
Group edit page → Configuration → Sync SSH Keys master switch (cascades to every Linux and Windows agent in the group).
Group edit page → Members → SSH Root and Sudo Root switches per member. Both cascade only to the Linux agents in the group; Windows agents in the group ignore them.

How the Agent Reconciles

SSH keys (Linux). The agent maintains a managed block inside each user's authorized_keys. Lines outside that block are preserved verbatim — existing keys aren't touched. Updates are safe against partial writes and against symlink tricks. Accounts that drop out of coverage (user removed from the agent, sync toggle off, user deleted) get their managed block stripped on the next reconcile.

SSH keys (Windows). Windows OpenSSH treats accounts in the local Administrators group specially — it reads their authorized keys from a single shared file (%PROGRAMDATA%\ssh\administrators_authorized_keys) instead of the per-user ~\.ssh\authorized_keys. The agent picks the right file automatically per user based on live group membership: per-user file for non-admins, shared admin file for accounts in Administrators. Be aware that any key in the shared admin file authenticates as any admin account on the host — this is a Windows OpenSSH default, not something we introduce; operators who need stricter scoping must customise their sshd_config. Per-user SSH Root / Sudo Root toggles aren't exposed on Windows yet.

Sudo grants (Linux only). The agent owns /etc/sudoers.d/managelm end-to-end and writes one passwordless NOPASSWD: ALL line per granted user. Each update is syntax-validated before being installed; if validation fails, the previous file stays in place — a broken sudoers can never lock you out. When the grant list is empty (or the master switch is off), the file is removed entirely.

FIDO / Hardware-Backed Keys

SSH FIDO keys generated with ssh-keygen -t ed25519-sk are accepted at upload time and synced just like any other key. sshd will require a physical touch on the YubiKey at login time. This is a separate credential from any passkey you've registered for portal MFA — same hardware, different credentials.

Triggers

The portal updates affected agents immediately whenever the effective state changes. A single update covers both SSH keys and sudo grants. Triggers include:

Master switch flipped on the agent or any group it belongs to.
SSH Root or Sudo Root flipped on a direct assignment or group membership.
SSH key added or deleted in a user profile.
User assignment added or removed on an agent.
Group attached or detached from an agent, or user added/removed from a group.
System Username changed on a user (admin edit).
User deleted — the managed key block and any sudoers grant are removed from every agent that previously covered them.
Agent reconnect — the agent receives the current state on connect.

Safety Notes

The agent never deletes lines outside its managed authorized_keys block, so manual edits aren't clobbered.
The managed sudoers file is owned end-to-end by the agent — manual edits to /etc/sudoers.d/managelm are overwritten on the next reconcile. Add unrelated rules in a different drop-in file.
Sudoers updates are syntax-validated before replacement; a malformed render is rejected and the previous working file stays in place.
If the agent loses portal connectivity, the last-synced state stays in place; sshd and sudo keep working off the local files.
Turning off the master switch (or removing all assignments / clearing System Username) strips the user's managed entries and sudo grants on next reconcile — no manual cleanup needed.

Activity Audit

ManageLM includes a built-in activity audit that tracks user activity on your servers. The audit collects login history, sudo activity, file changes, and package events on the agent in a read-only scan. On Linux it works without any extra dependencies (no auditd needed); on Windows it uses the Windows Event Log. Fully deterministic, no LLM needed.

How it works

Trigger — Click the Activity tab on an agent card in the Agent Assets page, then click Run Activity Audit.
Scan — The agent collects activity for the configured time window.
Parse — Events are normalized, deduplicated, and system accounts are filtered out.
Identity — Full names (including LDAP/SSSD users) are matched against ManageLM users — matched users appear as green badges.
Results — Displayed in the Activity Audit modal with dashboard cards and detail tables.

What the report shows

Login Success — Successful SSH/console logins with user, timestamp, and source IP.
Login Failed — Failed login attempts with user, timestamp, and source IP.
Sudo / Elevated Commands — Linux: all commands run via sudo (always shows the real user, even after sudo su -; supports compressed rotated log files). Windows: elevated PowerShell sessions from Windows Event Log.
Files Changed — Config files modified under /etc and /var/spool/cron, detected by modification time. Common system noise files are filtered.
Package Changes — Packages installed, updated, or removed (covers both RPM- and DEB-based distros, and Windows package logs).
Service Changes — Services started, stopped, or failed (from systemd journal on Linux, Windows Event Log on Windows).
Reboots — System reboot events with kernel version.

Time windows

Each audit collects data for a rolling time window:

Manual runs — last 24 hours
Daily schedule — last 24 hours
Weekly schedule — last 7 days
Monthly schedule — last 30 days

PDF export & scheduled reports

Click the Activity button at the top of the Agent Assets page to download a fleet-wide activity audit report as PDF. Use the Schedules popover to configure automatic report emails.

Constraints

Only one audit can run per agent at a time.
Each agent stores only its latest audit result.
The agent must be online.
The reports permission is required to start audits and change schedules.

Threat Detection

Continuous, LLM-narrated runtime threat detection on Linux hosts. The agent watches what's happening on the host in real time and the LLM turns anything that looks compromising into a human-readable alert in the portal — with a severity rating, a plain-English explanation of what happened, and one-click actions to stop it.

Two independent modes — one for services and daemons, one for human login sessions — can be enabled together or separately, per agent or per group. Both toggles are Linux only and can cascade from a group to every Linux agent in it (same model as Sync Admin SSH Keys).

Service Threat Detection

Watches services, daemons, and workloads — non-human activity. Suspicious behavior is collected over a short rolling window (~1 min) and sent to the LLM as a single batched verdict, so the model can correlate across events and write a short explanation of why the activity matters. Examples of what gets caught:

Web-server compromise — a shell spawned by nginx / apache / php-fpm (a classic remote-code-execution indicator).
Database compromise — a shell or command spawned by postgres / mysqld / mariadbd.
Credential theft — a service reading /etc/shadow, SSH host keys, or private key files.
Reverse-shell pattern — a service piping bash through /dev/tcp or launching nc -e.
Persistence installation — a service writing to /etc/cron.*, /etc/systemd/system/, or anyone's authorized_keys.
Privilege escalation prep — a service flipping the setuid bit, loading a kernel module, or disabling AppArmor / SELinux.
Curl-piped-to-shell and other supply-chain payload-staging patterns.
Unexpected outbound connections — web or database servers reaching out to the internet (typical exfil / C2 footprint).

The LLM judges each batch against the host's installed skills — the same skills you assigned to the agent (e.g. Web Server Management, Database Management, Email Server Management). Behavior that fits the host's role is dismissed silently (a PHP site making outbound HTTP on a web-server host is normal; the same behavior on a host with no web role is flagged). Batching also means a flapping service can't storm the inbox — one alert per incident, not one per event.

Session Threat Detection

Watches interactive SSH and sudo sessions for the ManageLM users that map to local accounts on the host. The LLM gets the user's permitted scope — built from the skills they're allowed to use on that host plus their optional free-text role description — together with a structured transcript of what they did, and decides whether the activity fits that scope. A session is judged as activity builds up, again when the user logs out, and after 10 min of inactivity — so a risky session can be flagged (and stopped) while the user is still connected, not only after they leave.

This isn't a fixed rule-list — the LLM judges holistically against the user's scope. Examples of what typically gets flagged:

A database administrator suddenly editing sshd_config or web-server configuration.
A web admin reading /etc/shadow, dumping audit logs, or running network reconnaissance.
Anyone deleting their own bash history before logging out.
Anyone installing kernel modules, planting cron entries, or modifying systemd units outside their scope.
Outbound connections to unexpected destinations from inside the session.
Sudo escalations to operations the user's skills and role don't cover.

Activity that the user explicitly reverses (stop a service, work on it, restart it) is not flagged — the LLM evaluates the whole session arc, not isolated events. Every mapped user is monitored — there is no per-user opt-out. A user with no assigned skills and no role description is judged as a standard, non-privileged account, so administrative actions (installing packages, editing auth files, changing services) will be flagged. A login whose user doesn't map to a ManageLM user is skipped, and the full session content never leaves the host.

Alert Severity

The LLM grades every batch on three levels; only the top two ever reach you:

High — clear indicator of compromise or behavior wholly outside the user's permitted scope. Investigate immediately.
Medium — unusual or noteworthy but plausibly legitimate. Worth a quick human glance.
Benign — no alert raised. Activity that's consistent with the host's installed skills (for service threats) or the user's permitted scope (for session threats) is dismissed silently and recorded only in the agent log. This is what keeps the inbox quiet on hosts running PHP apps, databases, mail servers, or any other declared role.

Where to Toggle

Agent edit page → Configuration → Service Threat Detection and Session Threat Detection (right below Sync Admin SSH Keys). Linux only — the toggles are hidden on Windows agents.
Group edit page → Configuration → same two toggles. Cascades to every Linux agent in the group; per-agent toggle becomes greyed with a (via group: <name>) tooltip when inherited.

Roles & Skills

Session Threat Detection judges every mapped user. What counts as “in scope” for a user is built from two things:

Assigned skills — the operations the user is permitted to perform on the host. An unrestricted user (and every owner/admin) gets the host's full skill set; a member with a per-user skill restriction gets only their allow-listed skills. These define what the LLM treats as expected activity.
Role Description (optional) — free text, up to 4 KB, layered on top of the skills to give the LLM extra context. Set under Users & Roles → (user) → Role Description; admins and owners can edit their own under Settings → Profile (members see their own read-only). Example: "Database administrator for production PostgreSQL. Performs schema migrations, backups, user provisioning. Should not touch the web tier or auth systems."

A user with neither skills nor a role description is treated as a standard, non-privileged account — routine work is fine, but administrative or privileged actions are out of scope and will alert. Linux sessions are matched to ManageLM users via the same System Username field used by SSH Key Sync; a login that doesn't map to a ManageLM user is skipped.

Where to See Alerts

Audit Logs → Threat Alerts tab. The row+expand layout matches the other Audit Logs tabs (Agent Activity, Admin Actions, Geolocation) and respects the same shared From / To pickers and global Site selector. Each row shows the time, the actor (the user for session alerts, or the offending process/daemon for service alerts), agent, kind, severity, the rule that fired, and a status badge (Unhandled, Discarded, Stopped, or Ended). Clicking expands the row in place to show the full event context, the role description used for judgment, and any action history — and, for admins on unhandled alerts, the same Kill / Discard actions offered in the alert email (below).

Email & One-Click Actions

Each alert offers these actions — from the alert email, and inline on unhandled alerts in the portal (admins):

Alert kind	Action button	What it does
Session	Kill Session	The agent terminates the flagged user's login session on the host.
Both	Discard Alert	Marks the alert a false positive and tells the agent to stop alerting on this same activity (rule + service/user) on that host.

Email actions go through a mandatory confirmation screen, and each link expires in one hour and can be used once; in-portal actions are admin-only and confirm in a dialog. Critical system processes (init, sshd, the agent itself, container runtimes) are never affected regardless of what the alert says.

Service alerts cover a batch of events from potentially several processes, so they don't carry a single process to kill — only Discard Alert is offered. Session alerts offer Kill Session only while the login is still active; once the user has logged out, only Discard remains.

Email Recipients

Service alerts — ManageLM admins and owners, plus members with access to the affected agent. Service and daemon threats are operational, so the agent's operators are notified.
Session alerts — ManageLM admins and owners only. These are behavioral judgments that may concern assigned members of the agent, so visibility is restricted to platform admins to avoid users seeing reviews of their own activity (or tipping off a malicious insider).

Privacy

For Session Threat Detection, the session details — commands, file paths, network destinations — stay on the host. Only the LLM's verdict and a short excerpt are stored in the portal alert. Where the LLM call itself runs depends on your deployment:

Deployment mode	LLM call destination	Recommended for sensitive workloads
SaaS	Trial LLM (Anthropic)	Acceptable under the standard SaaS terms.
Self-hosted, proxied LLM	ManageLM proxy → configured upstream	Same exposure as the SaaS path.
Self-hosted, local LLM	Stays on the customer host	Recommended. Session content never leaves your infrastructure.

Linux only. Threat Detection requires a modern Linux kernel (RHEL 9, Ubuntu 22.04+, Debian 12+ all qualify). Windows and macOS hosts are not supported in this release. The first time a host's toggle is turned on, the agent fetches its detection engine from the portal — this can take a moment on slow links; subsequent toggle changes are instant.

Service Dependencies

The Service Dependencies scan discovers cross-server service dependencies across your infrastructure. It shows what each server provides, what it depends on, and highlights connections between managed agents.

How it works

Trigger — Click the Service Dependencies button at the top of the Agent Assets page.
Scan — The portal triggers a scan on every online agent simultaneously. A progress modal shows each agent's scan status in real time.
Collect — Each agent runs a fully deterministic scan (no LLM needed):
- Provides — discovers all listening TCP services.
- Depends on — discovers outbound connections (established TCP) plus config-file parsing for intermittent dependencies.
- All hostnames are resolved to IPs locally on the agent before reporting.
Report — The portal matches dependency IPs against known agent IPs to identify managed vs external connections, and displays a per-agent report.

What is scanned

Source	What it finds
Established connections	All active outbound TCP connections to non-local IPs
Nginx configs	proxy_pass, upstreams, fastcgi_pass, uwsgi_pass, grpc_pass
Apache configs	ProxyPass, ProxyPassReverse, RewriteRule [P]
HAProxy config	Backend server definitions
Caddy config	reverse_proxy targets
.env files	DATABASE_URL, REDIS_URL, DB_HOST, SMTP_HOST, and many more
Docker Compose	Environment variables with connection strings
WordPress	DB_HOST in wp-config.php
Database replication	MySQL master-host, PostgreSQL primary_conninfo, Redis replicaof
Mail configs	Postfix relayhost and lookup tables, Dovecot auth backends
LDAP configs	ldap.conf, sssd.conf, nslcd.conf URI/host directives
NFS/CIFS mounts	Network mounts in /etc/fstab
Systemd units	Environment variables with connection strings in service files
Prometheus	Scrape targets in prometheus.yml
DNS resolvers	/etc/resolv.conf nameservers
NTP servers	ntp.conf, chrony.conf, timesyncd.conf
Syslog targets	Remote syslog destinations in rsyslog configs
SNMP traps	Trap sink destinations in snmpd.conf
Backup clients	Bacula, Bareos, Borg, Restic server addresses
Zabbix agent	Server= directive in zabbix_agentd.conf
Generic /etc sweep	URLs with host:port and raw IP:port patterns across all /etc files

Report format

Each agent's section shows:

Provides — green badges for each listening service and port.
Depends on — each dependency with a managed badge (connection to another managed agent) or external badge (connection to an unmanaged server).
Used by — which other managed agents connect to this server.

Constraints

All agents must be online to participate in the scan.
No database storage — results are computed on demand and held temporarily in Redis (2 minutes).
The reports permission is required to run a dependency scan.
The scan is fully deterministic — no LLM is used.

Connectors

Connectors wire ManageLM up to external systems. They come in three kinds, selectable as tabs in the Add Connector modal:

Cloud Integration — pull infrastructure inventory (VMs, volumes, networks, security groups) from a cloud provider and auto-match the VMs to your ManageLM agents.
SIEM Integration — push task-completion events from your agents out to an external SIEM (Splunk, Elasticsearch, or a generic JSON webhook).
Notifications & Ticketing — push platform events (monitor down, cert renewal failed, backup failed, security findings, threat alerts, ...) to Slack as chat messages, or ServiceNow / Jira as incident tickets.

All three kinds share the same permission (perm_connectors), the same encryption-at-rest (AES-256-GCM, requires ENCRYPTION_KEY), the same storage table, and the same CRUD pages. What differs is the data flow: cloud connectors pull on a schedule, SIEM connectors push task events from each agent, notification connectors push platform events from the portal.

Cloud Integration

Sync your cloud resources (VMs, volumes, networks, security groups) and auto-match them to ManageLM agents by IP address and hostname.

Supported providers

Microsoft Azure — Service principal auth (tenant ID, client ID, client secret, subscription ID)
Amazon AWS — Access key auth (access key ID, secret access key, region)
Google Cloud — Service account auth (project ID, service account JSON key)
VMware vSphere — Session auth (vCenter URL, username, password). Requires vSphere 7.0+
Proxmox VE — API token auth (API URL, token ID, token secret)
OpenStack — Keystone v3 password auth (auth URL, username, password, project, domain, region). Defaults to OVH endpoint

How it works

Go to Connectors in the sidebar and click Add Connector.
On the Cloud Integration tab, select a provider, enter a name, and fill in your credentials.
Click Save — the connector syncs automatically on creation.
Use Edit → Test Connection to verify credentials at any time.
Cloud resources appear in the connector's expanded view and on agent cards in the Agent Assets page.

What is synced

VMs / Instances — name, status, IPs, instance type, availability zone, security groups, tags
Volumes / Disks — name, size, type, encryption, attachment
Networks — VPCs, subnets, CIDRs
Security Groups / Firewalls — rules with direction, protocol, ports, source/destination

Agent matching

After each sync, ManageLM automatically matches cloud VMs to agents by comparing IP addresses and hostnames. Matched agents show a provider badge (e.g. AWS, Azure) on their card in the Agent Assets page. Expanding an agent card shows the full cloud metadata (instance type, zone, IPs, disks, security groups, tags).

MCP integration

Claude can query your cloud inventory using three built-in tools:

list_connectors — list configured cloud connectors with sync status and resource counts
search_cloud — search VMs, volumes, networks, security groups across all providers
get_cloud_info — detailed info for a single cloud resource with linked agent data

These tools are hidden until at least one cloud connector exists — a SIEM-only tenant will not see them in Claude's tool catalog.

Sync schedule

Each connector syncs on a configurable interval: every 1 hour, 6 hours, 12 hours, or 24 hours. Manual sync is available from the connector list (refresh icon). Syncs are distributed across portal instances using Redis locks to prevent duplicates.

Security

Connector credentials are AES-256-GCM encrypted at rest (requires ENCRYPTION_KEY env var).
Non-secret fields (region, project ID, URLs) are stored separately and visible when editing.
All cloud API calls have 30-second timeouts and SSRF protection (private IP ranges blocked).
Error messages are sanitized to prevent credential leakage in logs or UI.

SIEM Integration

Forward task-completion events from your agents directly to an external SIEM. Useful for compliance, centralized security monitoring, and audit trails outside ManageLM's own database. Forwarding is additive — the portal's own task log and audit trail are unchanged.

Agent-direct delivery. Events travel from the agent straight to the SIEM over HTTPS. The portal never sees the event stream in flight — it only distributes the SIEM config (endpoint + credentials) to the agents. This is what lets ManageLM SaaS forward into private / on-prem SIEMs behind NAT: the SIEM only has to be reachable from the managed server, not from our cloud.

Supported destinations

Splunk HEC — HTTP Event Collector. Needs the HEC URL and a token. Optional Splunk index and sourcetype. Pasted tokens may include a Splunk prefix — ManageLM strips it automatically.
Elasticsearch (_bulk) — needs the cluster URL, an index name, and a base64-encoded API key. The ApiKey prefix is stripped if pasted in.
Generic JSON Webhook — POSTs the event envelope as JSON to any URL. Optional raw Authorization header (e.g. Bearer <token>) and optional HMAC-SHA256 secret (agent signs the raw body; digest sent in X-ManageLM-Signature: sha256=<hex> so you can verify integrity on the receiving end).

What gets forwarded

One event per completed task — the same rows you see in the Command History panel of an Agent's detail page. Nothing else is forwarded: no heartbeats, no config pushes, no LLM traffic.

{
  "ts": "2026-04-17T14:23:11Z",
  "agent": { "hostname": "prod-web-01" },
  "task": {
    "id": "...",
    "skill": "firewall",
    "instruction": "block 1.2.3.4",
    "status": "completed",
    "output": "...",
    "error": null,
    "files_changed": ["/etc/nftables.conf"]
  }
}

Splunk wraps this in {"event": <envelope>, "sourcetype": "...", "index": "...", "host": "..."}. Elasticsearch sends it as an NDJSON _bulk body (action line + doc line). Webhook sends a JSON array of envelopes per batch.

How it works

Go to Connectors in the sidebar and click Add Connector.
Switch to the SIEM Integration tab, pick a type, enter a name, fill in the endpoint and credentials, and save. A Test Connection runs automatically on create.
Open an Agent detail page — or a Server Group — and pick the new SIEM from the SIEM Forwarding dropdown.
From that point on, every task completed by that agent fires a POST to the SIEM, in parallel with the normal task-result report to the portal.

Assignment and inheritance

Each agent has at most one SIEM destination. It resolves as:

If the agent itself has a direct override → that destination wins.
Else if its group(s) point at a single destination → inherit that one.
Else → no forwarding.

If an agent belongs to several groups whose SIEM settings differ, the portal refuses to guess — the agent gets a red SIEM CONFLICT badge on the Agent Assets list until you set an explicit per-agent override to resolve the conflict.

Agent groups show their SIEM destination as a small → <connector name> pill on the group card (read-only view).

Transport and reliability

Fire-and-forget. The agent enqueues the event into a bounded in-memory queue (512 events) and a background worker POSTs it. Task execution never blocks on SIEM delivery.
Circuit breaker. After 5 consecutive failures the agent backs off for 60s before retrying — a down SIEM does not get hammered.
No persistence. If the agent is restarted while the queue has events, those events are dropped. The portal's own task log always has the same data — SIEM forwarding is a fan-out, not a queue-of-record.
Config changes flush the queue. Reassigning an agent from Splunk A to Splunk B discards events still destined for A, preventing cross-tenant leakage.

Security

SIEM tokens and HMAC secrets are AES-256-GCM encrypted at rest, decrypted only to build the per-agent config. They are never logged.
The Test Connection button runs from the portal — private SIEMs behind NAT will therefore fail the portal-side test even if agents can reach them; the actual forwarding still works. (A future iteration may route the test through a chosen agent.)
The generic webhook supports HMAC-SHA256 signing so your receiver can drop unsigned or tampered events.
Because events travel agent → SIEM directly (not via the portal), customer event data never transits the ManageLM SaaS — relevant for data-residency and compliance requirements.

Permissions

Creating, editing, or deleting a SIEM connector requires the Connectors permission (perm_connectors) — the same gate as cloud connectors. Assigning a SIEM destination to an agent also requires the Agents permission; assigning one to a group requires the Groups permission.

Notifications & Ticketing

Route platform events to Slack, ServiceNow, or Jira so the people who need to know are paged in the channels they already watch — without writing custom webhook receivers. Notifications are an additive fan-out: in-app notifications, email, user webhooks, and SIEM forwarding all continue to work unchanged.

Supported destinations

Slack — posts a Block Kit message with severity-colored stripe, agent context, and an "Open in portal" button. Setup: create an Incoming Webhook integration on the Slack channel you want alerts in, then paste the resulting https://hooks.slack.com/services/... URL into the connector. The channel is fixed at the Slack side — one Slack connector = one channel.
ServiceNow — opens incidents via the Table API. Setup: create a dedicated integration user with the itil role (or any role that grants create/read on the target table), then enter your instance URL, username, password, and optional table (defaults to incident), assignment group sys_id, and caller sys_id.
Jira — opens issues via the Jira Cloud REST API. Setup: create an API token for an account that can create issues in the target project, then enter your site URL (https://your-domain.atlassian.net), account email, API token, project key, and optional issue type (defaults to Task). Severity and the correlation id are attached as issue labels (managelm, managelm-severity-*).

Adding a new destination type (PagerDuty, Teams, Zendesk, ...) is a single-file extension — the dispatcher and call sites are provider-agnostic, and the dispatcher discovers new providers from the connector schema automatically.

What gets forwarded

Every platform event that fires a webhook today is also eligible to fire a notification, plus two channels that webhooks do not carry: threat alerts and security drift. The complete category list:

Monitors — monitor.down / monitor.up / monitor.stalled / monitor.created / monitor.deleted
Certificates — cert.issued / cert.renewed / cert.renewal_failed / cert.revoked / cert.reactivated / cert.deleted
Backups — backup.completed / backup.failed
Security findings — security.drift (a previously-passing rule now fails)
Threat alerts — threat.alert.red / threat.alert.yellow
Reports — report.completed / report.failed / report.stalled (security audit, system inventory, SSH keys scan, activity audit)
Pentests — pentest.completed / pentest.failed
Agent lifecycle — agent.enrolled / agent.approved / agent.online / agent.offline
Tasks — task.completed / task.failed / task.needs_input

Subscription routing

Each connector has its own per-category routing, configured in the expanded view on the Connectors page. The column adapts to what the destination can do:

Notify (Slack) — tick the event categories this destination should receive as messages.
Open ticket (ServiceNow / Jira) — these are ticket-only destinations, so ticking a category opens a ticket for it. There is no separate "just message" option — a subscription is a ticket subscription.

Either way, untouched = no events (safe default: a new connector ships nothing until you opt in). One Slack connector for chatty ops alerts and a separate ServiceNow or Jira connector for ticket-worthy events is a common shape — tick all categories on Slack, tick only critical categories on the ticketing destination.

Severity and tickets

The dispatcher assigns a default severity (info / warn / critical) and a default ticket flag to every event. Sensible starting points:

monitor.down, cert.renewal_failed, backup.failed, security.drift, threat.alert.red — critical, ticket on by default
monitor.stalled, cert.revoked, threat.alert.yellow, report.failed, pentest.failed, agent.offline — warn
monitor.up, backup.completed, report.completed, agent.online, task.completed — info, no ticket

For Slack, the severity default decides whether an event renders as a plain message or a ticket-styled one. For ticket-only destinations (ServiceNow / Jira), every event in a subscribed category opens a ticket regardless of its default severity.

Correlation id

Every ticket carries a stable correlation_id derived from the event name + the most-identifying payload fields — for example managelm:monitor.down:<monitor_slug>:<agent_id>, so the same monitor flapping on the same agent shares one trackable id while different monitors stay separate (ServiceNow stores it in the correlation_id field; Jira attaches it as a label). Note: in this version ManageLM does not itself de-duplicate — each fire opens a new ticket. The id is provided so you can collapse re-fires on your side (a ServiceNow business rule / transform-map coalesce, or a Jira automation rule), and so a future release can add update-in-place.

Transport and reliability

Fire-and-forget. The dispatcher kicks off deliveries in parallel; portal event-firing code never blocks on the destination.
Retry with backoff. Each delivery is attempted up to 3 times with exponential backoff (~2s, ~4s) before giving up on this event — matches the user-webhook retry shape.
Auto-disable. After 10 consecutive failed deliveries, the connector flips to disabled and stops receiving events until the user re-tests. The Connectors page renders this state with an orange badge and the last error message. A successful delivery — or a passing Test Connection — resets the failure counter and flips the row back to active.
Feature-gated. Events under a disabled per-account feature (monitors, certificates, backups, pentests) are dropped before delivery — same gate as user webhooks.
Load-balanced safe. The dispatcher caches active connectors per account (60s TTL) with Redis pub/sub invalidation across portal instances. The failure counter is incremented atomically in SQL so two portals racing on the same connector cannot double-write or stall the auto-disable threshold.

Security

Slack webhook URLs, ServiceNow credentials, and Jira API tokens are AES-256-GCM encrypted at rest, decrypted only inside the dispatcher.
All outbound POSTs go through safeFetch with 10–15s timeouts and the same private-IP SSRF block-list as cloud and SIEM connectors.
The Slack provider validates that the URL host is exactly hooks.slack.com on save.
Test Connection verifies the destination is actually deliverable, not just reachable: ServiceNow probes the configured target table (sysparm_limit=1) so a wrong table or missing ACL is caught up front; Jira confirms the account and that the target project is visible. Neither creates any user-visible data.

Permissions

Creating, editing, or deleting a notification connector requires the Connectors permission (perm_connectors) — the same gate as cloud and SIEM connectors.

Permissions (shared)

The Connectors permission (perm_connectors) covers both kinds. Owners and admins have full access. Members need the permission toggled on in Users & Roles.

Change Tracking

ManageLM automatically tracks file changes made by every mutating task. Each agent maintains a local git repository that snapshots tracked directories before and after task execution, producing a precise record of what changed, when, and by which task.

How it works

Pre-snapshot — Before a task executes, the agent syncs all tracked files into its local git repo and commits a baseline.
Task execution — The task runs normally (LLM-driven commands).
Post-snapshot — After the task completes, the agent syncs again, commits the delta, and computes the list of changed files.
Report — Changeset metadata (files changed, commit hashes, summary) is sent to the portal and stored in the database. The full diff stays in the agent’s local repo.

What is tracked

Aspect	Detail
Tracked directories	`/etc/` — covers SSH, nginx, firewall, cron, sudoers, sysctl, network config, and more
Skipped content	Binary files, files > 512 KB, symlinks, and noisy directories (`ssl/certs`, `pki/ca-trust`, `firmware`, `kernel`, `selinux/targeted/policy`)
Git implementation	dulwich (pure Python) — no git CLI needed on the host
Repo location	`/opt/managelm/git/` on each agent
Retention	30 days — older commits are automatically pruned daily

Viewing changes

When a task modifies tracked files, a changeset badge appears on the task in the task log (in the Agent Detail page and the MCP Log). The badge shows the number of files changed.

With MCP (Claude), use the built-in get_task_changes tool to inspect what a task modified:

get_task_changes(task_id="...", full_diff=true)

This returns:

List of changed file paths
A summary (e.g. “Modified 3 files in /etc/nginx/, /etc/ssh/”)
Optionally, the full unified diff (when full_diff=true) — fetched on demand from the agent’s local repo

Reverting changes

If a task made unwanted changes, you can revert them to restore the previous file state. Use the revert_task MCP tool:

revert_task(task_id="...")

This fetches the diff from the agent’s local git repo and applies a reverse patch, restoring the files to their pre-task state. The revert is tracked as a separate changeset.

Requirements: The agent must be online for full diffs and reverts (the data lives in the agent’s local repo). The changeset must be within the 30-day retention window. Changeset metadata (file list, summary) is always available in the portal database regardless of agent status.

Non-mutating tasks

Tasks classified as read-only (non-mutating) by the LLM skip the snapshot process entirely — no changeset is created. This keeps the git history clean and avoids unnecessary I/O for read-only operations like status checks and log queries.

Audit Log

The Audit Log gives you a single chronological view of everything that happened in your account — both who ran what on which agent, and who changed which setting in the portal. Open it from the Audit Logs entry in the sidebar.

The page is organised into four tabs, all sharing one From / To date picker and one site-scope filter:

Agent Activity tab

Every task executed against an agent — from the portal, MCP, or shell. Each row shows when it ran, who submitted it, which agent and skill, and the outcome. The expanded row carries the full operation detail, including:

AI Report — a short narrative auto-generated by the agent's LLM after completing the task, so you can scan results without reading raw command output.
Command — the original request as it was submitted.
Output — the raw stdout/stderr the agent returned.
Files changed — when the task wrote to disk, the list of affected paths.
IP & Location — the operator's IP and geo-resolved city/country (when GEOIP_DATABASE is configured). MCP-submitted tasks are flagged with a "MCP connector" badge so the IP isn't mistaken for the user's location.

The toolbar adds two Agent-Activity-only controls when the tab is active:

Changes only (on by default) — restricts the list to mutating operations, hiding read-only status checks and log queries. Toggle off to see everything.
Export PDF — downloads a printable report covering the current From / To window and the Changes-only setting. The filename carries both, so two consecutive exports with different filters don't collide.

Threat Alerts tab

Alerts from the agent's Threat Detection layer (Linux only). Both Service alerts (a batched LLM verdict over a window of daemon behaviour, judged against the host's installed skills) and Session alerts (LLM judgement of admin SSH/sudo sessions against the user's permitted scope) land here.

Admin Actions tab

Every administrative change in the portal:

Category	Actions
Authentication	Login, logout
Users	Invite, update role/permissions, delete, transfer ownership
Agents	Approve, delete, update settings, bulk actions
Skills	Create, import, update, delete, document upload/delete
Groups	Create, update, delete, member changes
Webhooks	Create, update, delete
API Keys	Create, delete
MCP	Configuration changes (IP whitelist, etc.)
Account	Settings changes, license activation/removal

Each entry records timestamp (in your timezone), user (name and email), action type (e.g. agent.approved, skill.created), target resource, and the client IP.

Geolocation tab

World map of admin connection origins with numbered pins linked to a side legend. Appears when GEOIP_DATABASE is configured on the portal.

Access control

Owners and admins see everything across the account.
Members with the logs permission also see everything — this is the permission that grants full account-wide visibility for the agent activity log and its PDF export.
Members without logs see Agent Activity and Threat Alerts scoped to agents they are assigned to (directly or via a group); Admin Actions is limited to their own actions.
perm_reports does not gate the Audit Log — it is reserved for security audit, inventory, SSH-keys, and activity report generation + scheduling in Agent Assets.

Webhooks

Get notified when things happen in your account.

Available events

Event	Fires when
`agent.enrolled`	A new agent requests enrollment
`agent.approved`	An agent is approved
`agent.online`	An agent connects
`agent.offline`	An agent disconnects
`task.completed`	A task finishes successfully
`task.failed`	A task fails
`report.completed`	A security audit or inventory scan completes
`report.failed`	A security audit or inventory scan fails
`monitor.down`	A service monitor goes down (after consecutive failure threshold)
`monitor.up`	A service monitor recovers from down
`cert.issued`	A new certificate is issued and deployed to an agent
`cert.revoked`	A certificate is revoked (CRL updated, LE notified for LE certs)
`cert.renewed`	A certificate is automatically renewed by the daily sweep
`cert.renewal_failed`	Automatic certificate renewal failed
`cert.reactivated`	A revoked certificate is reactivated (internal CA only)
`cert.deleted`	A certificate is soft-deleted from the portal

Configure webhooks from Settings → MCP & API. Enter a URL, select events, and optionally provide an HMAC secret. Payloads are signed with HMAC-SHA256 via the X-Webhook-Signature header when a secret is configured.

Delivery retries up to 3 times with exponential backoff. After 10 consecutive failures, the webhook is automatically disabled. Re-enabling it resets the counter. Maximum 25 webhooks per account.

In-App Notifications

The portal includes a real-time notification system accessible from the Notifications bell in the sidebar. Notifications are delivered alongside email alerts for key events.

Notification triggers

Agent enrolled — A new agent requests approval (account-wide, visible to all admins).
Agent approved — An agent is approved (account-wide).
Agent updated — An agent applied an auto-update (account-wide).

How it works

An unread count badge appears on the notification bell when new notifications arrive.
Click the bell to open the dropdown panel. All unread notifications are marked as read automatically.
Notifications link to the relevant page (e.g. agent detail, request log).
The bell polls for new notifications every 30 seconds.
Use the clear button to remove read notifications from the list.

Storage: Notifications are stored in Redis with a 24-hour auto-expiry (max 50 per user). They are ephemeral and do not persist across Redis restarts.

Deployment & .env

The portal is configured via environment variables in a .env file. Below is a reference of all available settings.

Core

Variable	Required	Default	Description
`DATABASE_URL`	Yes	—	PostgreSQL connection string
`SERVER_PORT`	No	3000	HTTP listen port
`SERVER_URL`	Yes	—	Full public URL (e.g. `https://portal.example.com`)
`ACCESS_TOKEN_TTL`	No	86400	Access token lifetime in seconds (24h). Tokens are opaque random strings stored in Redis — no signing secret.
`REFRESH_TOKEN_TTL`	No	2592000	Refresh token lifetime in seconds (30d).
`DEFAULT_TIMEZONE`	No	UTC	Default timezone for new users
`TASK_TIMEOUT_SECONDS`	No	300	Max duration for synchronous task execution (seconds)
`FILE_TRANSFER_MAX_BYTES`	No	26214400	Max file transfer size (default 25 MB)
`LOG_LEVEL`	No	`info`	Log verbosity: `trace`, `debug`, `info`, `warn`, `error`, `fatal`, `silent`
`CLUSTER_WORKERS`	No	`2`	Number of Node.js cluster workers. Set to `1` to disable clustering.
`SERVER_MODE`	No	`selfhosted`	`saas` = hosted SaaS (trial LLM available), `selfhosted` = Docker/on-prem (proxied LLM available).
`NOTIFY_EMAIL`	No	—	Email address for platform operator alerts (account created/deleted notifications).
`ENCRYPTION_KEY`	No	—	AES-256 key for encrypting connector credentials at rest (cloud provider secrets and SIEM tokens). Required to use Connectors. Generate with: `openssl rand -hex 32`

SMTP & DKIM

Variable	Required	Default	Description
`SMTP_HOST`	No	—	SMTP server hostname. When empty, emails are sent directly to recipient MX servers (no mail server required).
`SMTP_PORT`	No	25	SMTP server port
`SMTP_FROM`	Yes	—	From address for all emails
`SMTP_SECURE`	No	`none`	`none` = plain (localhost:25), `starttls` = upgrade via STARTTLS (587), `tls` = implicit TLS (465)
`SMTP_USER`	No	—	SMTP auth username (for external relays)
`SMTP_PASS`	No	—	SMTP auth password
`DKIM_DOMAIN`	No	—	Domain for DKIM signing (e.g. `example.com`)
`DKIM_SELECTOR`	No	default	DKIM selector (matches DNS TXT record)
`DKIM_PRIVATE_KEY_PATH`	No	—	Path to PEM private key file
`DKIM_PRIVATE_KEY`	No	—	Inline PEM private key (use `\n` for newlines)

DKIM setup: When DKIM_DOMAIN and a private key are set, all outgoing emails are signed with DKIM (RSA-SHA256). You also need to publish a DNS TXT record at {selector}._domainkey.{domain} with the matching public key.

Redis (required)

Variable	Required	Default	Description
`REDIS_URL`	Yes	—	Redis connection URL (e.g. `redis://localhost:6379`). Supports `redis://`, `rediss://`, `valkey://`, `valkeys://` schemes.
`REDIS_TLS`	No	`auto`	`auto` = TLS if URL uses `rediss://` or `valkeys://`, `on` = force TLS, `off` = no TLS
`REDIS_DB`	No	`0`	Logical database number (0–15). Useful when sharing a Redis instance.

Redis is a mandatory component used for:

MCP session persistence — sessions survive portal restarts.
Cross-instance messaging — pub/sub for cache invalidation, agent updates, and tool change notifications.
In-app notifications — ephemeral per-user notification storage.
Distributed locks — ensures background maintenance runs on only one instance.
Horizontal scaling — multiple portal instances share state.

Database

Variable	Required	Default	Description
`DB_POOL_MAX`	No	20	Max PostgreSQL connection pool size
`DB_SSL`	No	`none`	`none` = no SSL, `require` = SSL (skip cert verify), `verify` = full CA verification, `verify-ca` = custom CA cert
`DB_SSL_CA`	No	—	Path to CA certificate file (used with `DB_SSL=verify-ca`)
`TASK_LOG_RETENTION_DAYS`	No	30	Days to keep task log entries
`AUDIT_LOG_RETENTION_DAYS`	No	90	Days to keep audit log entries
`TASK_LOG_MAX_PER_ACCOUNT`	No	5000	Max task log entries per account
`AUDIT_LOG_MAX_PER_ACCOUNT`	No	10000	Max audit log entries per account
`SESSION_RETENTION_DAYS`	No	30	Days before inactive login sessions are deleted
`PENDING_AGENT_RETENTION_DAYS`	No	14	Days before unapproved agent enrollments are deleted
`EMAIL_VERIFY_RETENTION_DAYS`	No	7	Days before stale email verification tokens are cleared
`MONITOR_RETENTION_DAYS`	No	90	Days to keep monitor events. Rollups are kept 4× longer for trend charts.

Performance notes

The portal includes several built-in performance optimizations for high-load deployments:

MCP tool caching — Generated tool lists are cached per account (60s TTL), automatically invalidated when skills are created, modified, or deleted. Assigning or removing skills from agents/groups does not change the MCP tool list.
Auth caching — MCP authentication results are cached (30s TTL) to avoid repeated DB lookups.
Batched heartbeats — Agent heartbeat writes are batched and flushed every 5 seconds instead of one DB write per heartbeat.
Webhook caching — Webhook configurations are cached per account (60s TTL).
Optimized queries — MCP tool calls use a single combined query instead of multiple sequential lookups.
Database indexes — Indexes on hot-path columns (token hashes, access control tables, agent lookups).

For high-traffic deployments, increase DB_POOL_MAX and configure REDIS_URL for session persistence and horizontal scaling.

Account Migration

Move an entire account between deployments — for example from the hosted SaaS to a self-hosted install, or between two self-hosted servers — without re-creating everything by hand. The owner exports one encrypted file and imports it on the destination.

From Settings → Account → Account Migration (owner only):

Export (available on SaaS and self-hosted): choose a passphrase and download managelm-account-<name>-<date>.json. The payload is encrypted with your passphrase (scrypt + AES-256-GCM). Keep both the file and the passphrase safe — the file contains your account's secrets (LLM keys, connector credentials, agent credentials).
Import (self-hosted only): register a fresh account on the destination portal, then upload the file and enter the same passphrase. Your current login becomes the owner of the imported account.

What moves: agents (with their credentials preserved), sites, agent groups, skills and their uploaded documents, policy rulesets, connectors, webhooks, service monitors, certificates & PKI, and the whole team — users, roles, permissions and access grants, with passwords carried over. Every item keeps its identity, so all the links between them stay intact.

What does not move: history and scan results (security audits, inventory, activity, monitor history, threat alerts, task logs — these regenerate on the destination), login sessions, and passkeys/MFA (these are tied to the portal's web address, so team members re-enroll on the new instance). The license is not included in the file either.

After importing: each agent only needs its SERVER_URL repointed at the new portal — it then reconnects automatically with its existing credentials and resumes work. Re-enter your license key from the same Account tab; a paid key activates on the new instance and its entitlement follows the most recently activated server (the previous instance reverts to Free).

Import requires a fresh account. The destination account must have no agents, skills or sites yet — import does not merge into an account that already contains configuration. Register a new account for it.

Background Maintenance

The portal automatically cleans up stale data using three background tasks. Each runs on a distributed Redis lock, so only one portal instance executes per interval — no external cron is needed.

Task	Interval	What it does
OAuth cleanup	Every 30 min	Deletes expired MCP OAuth tokens and authorization codes
Log purge	Every 1 hour	Age-based and count-based pruning of `task_log` and `audit_log`
Maintenance	Every 6 hours	Cleans all other stale resources (see table below)
Scheduled scans	Every 15 min	Triggers security audits and system inventories for agents with a configured schedule (daily/weekly/monthly), and generates scheduled PDF reports for accounts

All tasks also run once on portal startup.

Maintenance targets

Resource	Cleanup rule	Configurable
Login sessions	No activity in `SESSION_RETENTION_DAYS` (default 30)	Yes
Expired invitations	Past `expires_at` and not accepted	—
Expired API keys	Past optional `expires_at`	—
Password reset tokens	Past `password_reset_expires_at`	—
Email verification tokens	Unverified accounts older than `EMAIL_VERIFY_RETENTION_DAYS` (default 7)	Yes
WebAuthn challenges	User inactive > 6 hours (abandoned registration flow)	—
Pending agent enrollments	Unapproved for `PENDING_AGENT_RETENTION_DAYS` (default 14)	Yes
Monitor events	Older than `MONITOR_RETENTION_DAYS` (default 90)	Yes
Monitor rollups	Older than 4× `MONITOR_RETENTION_DAYS` (default 360 days)	Yes
PKI certificates	Soft-deleted certs after natural expiry, expired certs after 7 days, stale failed/pending after 7 days	Yes

Configurable retention values can be set via environment variables in .env. See the Deployment & .env section for details.

Reinstalling an Agent

You can reinstall an agent without losing its configuration (skills, groups, members).

Go to the agent's detail page.
Click the Reinstall button.
Copy the install command and run it on the server.
Approve the re-enrollment when prompted.

The agent gets a fresh access token and signing key while keeping all its existing configuration intact.

Custom Skills

You can create your own skills to extend what agents can do.

Go to Agent Skills and click Create Skill.
Define the skill's slug, name, and description.
Add operations (name and description for each capability).
Set the allowed commands.
Write a system prompt that guides the LLM.

Tips for custom skills

Be specific with allowed commands. Only permit what's needed.
Write clear system prompts. Tell the LLM what it is, what it can do, and any constraints.
Write descriptive operations. Each operation's description helps the LLM understand the skill's capabilities.
Test with the UI first before connecting Claude, using the Run Task button on the agent detail page.

Import / Export

Skills can be exported as JSON files and imported into other accounts. Use the export button on any skill, or import from the skills page.

Skill Documents (RAG)

You can upload reference documentation to any skill. When a task is dispatched, relevant sections are automatically retrieved and injected into the LLM prompt — giving the agent knowledge about products, tools, or APIs that the LLM wasn't trained on.

No external dependencies. Document search uses PostgreSQL full-text search (tsvector + GIN index). Works out of the box on both SaaS and self-hosted installations.

How it works

Upload — Drop .txt, .md, .pdf, .html, .doc, or .docx files onto the skill's edit form. Text is extracted automatically and chunked for indexing.
Retrieve — When a task matches, the portal searches document chunks using the task instruction and retrieves the top matching sections.
Inject — Matching chunks are injected into the agent's system prompt as a REFERENCE DOCUMENTATION block, before the task instructions.

Uploading documents

Go to Agent Skills and click the Edit (pencil) icon on a skill.
Below the Detailed Description field, you'll see the Reference Documents section with a drag-and-drop zone.
Drop one or more files (.txt, .md, .pdf, .html, .doc, .docx), or click the zone to browse.
Each uploaded file is shown with its filename, size, chunk count, and upload date.
To remove a document, click the trash icon next to it.

Chunking

Text is first extracted from the uploaded file (PDF → pdf-parse, DOC/DOCX → mammoth, HTML → tag stripping, TXT/MD → as-is), then split into chunks of ~1000–1500 characters for efficient retrieval:

Markdown files are split on headings (##). Large sections are further split at paragraph boundaries. Heading context is preserved with each chunk.
All other formats (plain text, PDF, HTML, DOC/DOCX) are split at paragraph boundaries. Short paragraphs are merged together.

Retrieval at task time

When a task is sent to an agent, the portal searches the skill's document chunks using PostgreSQL's websearch_to_tsquery. The top matching chunks (up to 10 chunks / 30,000 characters by default) are injected into the system prompt. If no chunks match the instruction, nothing is injected.

Limits

Limit	Default	Environment Variable
Max file size	2 MB	`SKILL_DOC_MAX_SIZE_BYTES`
Max documents per skill	10	`SKILL_DOC_MAX_PER_SKILL`
Max total size per skill	10 MB	`SKILL_DOC_MAX_TOTAL_BYTES`
Max chunks per task	10	`RAG_MAX_CHUNKS`
Max chars per task	30,000	`RAG_MAX_CHARS`

Use cases

Custom application docs — Upload deployment guides, runbooks, or config references for internal tools.
API documentation — Give agents context about APIs they need to interact with.
Product manuals — Upload vendor documentation for products the LLM wasn't trained on.
Compliance & procedures — Upload SOPs or checklists the agent should follow.

Cleanup: Documents and their chunks are automatically deleted when the parent skill is deleted (CASCADE). No manual cleanup needed.

ManageLM Documentation

Overview

How It Works

What you can do

Quick Start

What You'll Need

Create an Account

Install an Agent

Linux

Windows

Approve the Agent

Set Up the LLM

Option 1: Local LLM (Recommended)

Recommended local models

CPU-only servers (not recommended)

GPU servers (min 16–24 GB VRAM)

Option 2: Cloud LLM

LLM Access Mode

Assign Skills

Built-in Skills (31 total)

Choose Your Interface

Connect Claude

Option A: Custom Connector (OAuth)

Option B: Local bridge (mcp-remote)

What Claude sees

Run Tasks

Via Claude (MCP)

Via the Portal UI

Agent CLI Tools

managelm-shell — Interactive terminal

managelm-fixit — Diagnose & fix one file

managelm-review — Read-only review

Quick Reports

How it works

Available reports

Active task indicators

Portal UI Guide

Skills

Management Hints

Skill Definition Example

Skill Combinations

Foundation skills

Common multi-skill workflows

Create a new system user with SSH access

Install and configure Nginx with SSL

Deploy a Node.js application

Set up a PostgreSQL database server

Docker Compose deployment

Security hardening

Set up WireGuard VPN

Skill assignment strategies

Policy Rulesets

Managing Rulesets

Attaching Rulesets

Agent Groups

Group-level skill configuration

Sites & Datacenters

What sites do

LLM resolution chain

Creating and managing sites

Detached resources under a specific-site filter

Site vs group at a glance

Secrets

LLM Configuration

Configuration hierarchy

Users & Roles

Roles

Member Permissions

MCP Visibility

Skill Restrictions

Site Admin Grants

Inviting Users

Passkeys & MFA

Require MFA for all users (account owners)

SSH key as MFA fallback (account owners)

API Keys

OAuth App Credentials (OpenAI GPT, etc.)

Security Model

Defense in depth

Always-allowed commands (read-only)

Option B: Local bridge (`mcp-remote`)

`managelm-shell` — Interactive terminal

`managelm-fixit` — Diagnose & fix one file

`managelm-review` — Read-only review