tutorials

Monitor a Proxmox Cluster Without Datadog (or a Second Mortgage)

Stock Proxmox dashboards are fine until they aren't. Here's how to monitor a real cluster — host metrics, VM state, disks, the works — with a free agent and zero enterprise pricing.

Thiago VinhasApril 21, 20269 min read

proxmoxhomelabself-hostedvm monitoringtutorials

Share this article:

Twitter LinkedIn Facebook

Monitor a Proxmox Cluster Without Datadog (or a Second Mortgage)

Three nodes. Eleven VMs. Two LXCs. The web UI looks great until the moment a VM panics and reboots itself into a kernel that won't come up — and you find out about it eight hours later because nobody opened the dashboard.

Stock Proxmox monitoring is a live view, not a watchdog. It draws pretty graphs. It does not wake you up. And the moment you start googling "Proxmox monitoring" you fall straight into a tarpit of Prometheus exporters, Grafana dashboards you have to maintain, and SaaS vendors quoting $15 per host per month with a 5-host minimum.

This is the no-nonsense version. Free agent on each Proxmox node, real alerts to your phone, no Helm charts, no YAML cosplay.

What stock Proxmox actually gives you

The built-in Proxmox dashboards are not bad. They're just not enough.

You get:

Real-time CPU, RAM, disk I/O graphs per node and per VM
Cluster status (quorum, node up/down)
Storage utilization
A task log that scrolls when something interesting happens

You don't get:

Alerts. Not real ones. There's an email hook for backup jobs and that's about it.
History past a few weeks at usable resolution, depending on how pvestatd is configured.
A heads-up when a VM stops. The UI shows it as red. Your phone says nothing.
Disk-fill projection. It'll tell you you're at 91%. It won't tell you that two days ago you were at 78%.
Anything cross-node. You see one node at a time. If you want to know "is anything on fire across all three?", you click around.

For a single homelab node nobody depends on, fine. For a cluster running your Home Assistant, your Plex library, your wife's photo backups, and the Minecraft server your kid screams about — not fine.

The setup we're targeting

I'll assume:

You have a Proxmox cluster (or a single PVE node — the steps are identical)
Each node runs Debian 12 underneath (PVE 8.x is Debian 12-based)
You have SSH and root on each node
You're allergic to running a separate Prometheus + Grafana + Alertmanager stack just to know if a VM died

We're going to install a single Go binary on each Proxmox node. It auto-discovers what's running, ships metrics, and lets you set threshold alerts that go to email, mobile push, or webhook.

Installing the agent on a Proxmox node

SSH into the first node. The install is one line:

curl -sSL get.netwarden.com | bash

That script:

Detects Debian 12 and the architecture (almost certainly amd64 on Proxmox unless you're doing something exotic)
Drops the binary at /usr/local/bin/netwarden-agent
Installs a systemd unit
Asks you for the agent token (paste it in — you get it from the dashboard when you add a host)
Starts the service

Verify it's alive:

systemctl status netwarden-agent
journalctl -u netwarden-agent -f

Within about 60 seconds you should see metrics flowing in the dashboard. Repeat on node 2 and node 3. That's the install done.

For the long-form version with all the flags, see the installation docs.

What the VM collector picks up automatically

Here's the part that actually matters for Proxmox specifically. The agent ships with a libvirt/KVM/Proxmox VM collector that auto-discovers — no config — every VM the hypervisor knows about.

You don't list your VMs. You don't tag anything. It reads them off the host.

For each VM you get:

State (running, stopped, paused, shut off, crashed)
vCPU allocation and host-level CPU time consumed
Memory allocation and used
Per-disk I/O (read/write bytes, IOPS)
Per-NIC traffic (rx/tx bytes)

When you spin up a new VM through the Proxmox UI, the agent picks it up on the next collection cycle. When you destroy one, it disappears. There's no "now go register the VM in the monitoring tool" step.

This is the same pattern the container collector uses for Docker and Podman — discover, don't configure. Same idea applies to databases too — if you're running Postgres or MySQL inside one of those VMs, the agent picks it up if you install it inside the guest.

Setting up the two alerts that actually matter

Out of the dozen alerts you could set, two genuinely justify their existence on a Proxmox host. Set these and resist the urge to add more until they've fired in anger.

Alert 1: "VM stopped"

The VM collector emits a per-VM vm_state metric. You want to alert when any production VM transitions out of running.

In the alerts UI, create a threshold alert:

Metric: vm_state (per-VM)
Condition: state is not running
Duration: 2 minutes (so a quick reboot during patches doesn't page you)
Notification: email + mobile push

A clean snapshot reboot completes well under two minutes. A VM that is still "stopped" 120 seconds later is a VM that needs your attention.

If you don't want every VM monitored — say you have build-VMs that you stop intentionally — apply the alert by tag or by hostname pattern instead of cluster-wide.

Alert 2: "Host disk above 85%"

This is the one that catches the silent killer: a VM with thin-provisioned disks slowly eating the host's storage pool, or a backup target that nobody pruned in nine months.

Metric: disk_used_percent (per-mount, on the host — not the guest)
Condition: > 85
Duration: 15 minutes
Notification: email (no push — this is "deal with it tonight," not "wake up now")

85 is the right number. 90 is too late on a thinly-provisioned ZFS pool — by the time you see it, snapshots are already failing. 80 is too noisy because backup writes routinely push past it.

For more on tuning thresholds without ending up with alert fatigue, there's a whole post on that.

What the dashboard actually shows you

Once the agents are reporting, the cluster view is what you want for daily glances. The default host page gives you, per node:

A six-hour CPU graph with a clear "now" line
Memory used/free as a stacked area
Per-disk space usage as horizontal bars (the 85% line is visible — you'll spot a creeping disk before it pages)
A network throughput graph with rx/tx split
A VM list with state pills (green = running, gray = stopped, amber = paused) and per-VM CPU/memory mini-bars

The bit I actually use most is the cluster-overview dashboard you can build by dragging a few widgets together. One row of node-status widgets, one row of "top 5 VMs by CPU," one big disk-usage table for all mounts across all nodes. That's the whole picture in one screen.

You build it once with custom dashboards. It survives reboots. It works on your phone.

Pricing reality, because I'm not going to lie to you

Free tier: 1 host. That covers a single Proxmox node. If you have a one-node "cluster" (single PVE box, half the homelab world is this), you're done — free forever.

For a real three-node cluster, you need three host slots. The Solo plan is $9.90/month for 5 hosts, which covers three Proxmox nodes plus two extra (a Pi running Pi-hole, a NAS, whatever else you want). Pro is $29.90 for 25 hosts.

Compared to:

Datadog: $15/host/month, $18 with the APM tier you'll get upsold into. Three Proxmox nodes = $45-$54/month minimum, and that's before they bill you for "custom metrics" because you dared collect VM-level data.
Self-hosted Prometheus + Grafana + Alertmanager: free in dollars, expensive in your weekends. Someone has to maintain the storage, upgrade the binaries, write the alert rules, fix the dashboards when a label changes.

The honest tradeoff: $9.90 for three nodes vs. a free stack that you maintain. If your time is worth more than $10/month, the math is obvious. If you genuinely enjoy maintaining a Prometheus cluster as a hobby, that's also a legitimate answer.

What about the agent itself — is it heavy?

On a Proxmox node it idles around 20-35 MB RSS. CPU is unmeasurable on anything modern. The binary is ~25 MB on disk. It will not be the thing that ruins your day.

If you're already running the agent on the host, you generally do not also install it inside the VMs unless you specifically want guest-level metrics. The host-side VM collector gives you allocation, state, and per-VM I/O. Guest agents only matter if you care about what's happening inside the VM — process list, the database that's running there, etc.

When NOT to use this

I'll be straight: if you're already running a fully built-out Prometheus stack with node_exporter on every Proxmox node, custom recording rules, Alertmanager wired into your on-call rotation, and you enjoy it — keep it. This post is not for you.

This is for the person whose "monitoring" is currently:

Logging into the Proxmox UI when they remember
Hoping nothing breaks
Discovering things broke when their kid yells "Plex is broken!"

That person should install the agent right now and be done in ten minutes.

Doing it right now

On each Proxmox node:

curl -sSL get.netwarden.com | bash

Sign up at app.netwarden.com, grab a token per host, paste, done. The free tier covers one node — start there, see if you like it, expand to the rest of the cluster on Solo if it earns its keep.

If you'd rather run the whole thing yourself, the self-hosted single-binary build is in preview — it'll be one binary you point at a directory, no PostgreSQL required. Watch the docs for that landing.

Keep Reading

Raspberry Pi Home Server Monitoring in 2026 — Same playbook, ARM64 edition, for your Pi running Plex and Pi-hole.
Self-Hosted Uptime Monitoring: The Honest Pingdom Alternative — How Netwarden compares to Uptime Kuma, StatusCake, and HetrixTools for small homelab fleets.
VM Monitoring — The full reference on what the libvirt/KVM/Proxmox collector picks up.
Alerts Setup — Threshold alerts, durations, and notification routing.

Ready to stop hoping? Run curl -sSL get.netwarden.com | bash on your Proxmox node and you'll have real alerts inside ten minutes.

Get More Monitoring Insights

Subscribe to our weekly newsletter for monitoring tips and industry insights.

Share this article

Help others discover simple monitoring

Raspberry Pi Home Server Monitoring in 2026

Your Pi is doing real work. It runs Plex, blocks ads for the whole house, and tells the lights to dim at sunset. Here's how to monitor it properly without an entire observability stack swallowing the SD card.

Netwarden Team-May 5

The Small-Team and Homelab Monitoring Playbook

Most monitoring guides are written for 200-engineer SRE orgs. This one is written for the rest of us — the solo dev, the small IT shop, the homelabber with 1-25 boxes who needs real alerts without standing up a five-service monitoring stack.

Netwarden Team-Apr 20

How Netwarden's Security Wedge Works

Most monitoring tools don't surface security signals. Most security tools don't surface monitoring signals. We built one tool that does both — because the people we sell to don't want to pay for two. Here's how the security wedge actually works under the hood.

Netwarden Team-May 11

Ready for Simple Monitoring?

Stop wrestling with complex monitoring tools. Get started with Netwarden today.

Get Started Free View Pricing

Monitor a Proxmox Cluster Without Datadog (or a Second Mortgage)

What stock Proxmox actually gives you

The setup we're targeting

Installing the agent on a Proxmox node

What the VM collector picks up automatically

Setting up the two alerts that actually matter

Alert 1: "VM stopped"

Alert 2: "Host disk above 85%"

What the dashboard actually shows you

Pricing reality, because I'm not going to lie to you

What about the agent itself — is it heavy?

When NOT to use this

Doing it right now

Keep Reading

Get More Monitoring Insights

Share this article

Related Articles

Raspberry Pi Home Server Monitoring in 2026

The Small-Team and Homelab Monitoring Playbook

How Netwarden's Security Wedge Works

Ready for Simple Monitoring?