Why I Built Netwarden — A Homelab Story
An honest founder note on what pushed me to build my own monitoring tool: a homelab that outgrew Uptime Kuma, a cloud monitoring bill that didn't make sense, and a 2am that I want fewer of.
Why I Built Netwarden — A Homelab Story
I didn't set out to build a monitoring company. I set out to stop getting bitten by my own homelab.
This is the short version. No corporate "we" in this post — it's me writing.
The 2am that started it
A while back, on a Saturday night, my wife's WordPress site went down. I was on the couch. The site is a side project — a few hundred visitors a day, nothing dramatic. But it was down, and I didn't know it was down, and she found out before I did because someone messaged her about it.
I went into the office, opened the laptop, checked Uptime Kuma — green. Checked the host — also green by Kuma's eyes. Eventually figured out the underlying VM was fine but the MariaDB container had run out of memory, the WordPress container couldn't talk to it, and Kuma's HTTP check was happily passing because Nginx was returning a cached 200 from the maintenance page.
Lesson one: HTTP-200 is not "the thing works." Lesson two: I had no visibility into what was actually happening at the container or database layer on my own box. Lesson three: I wanted to know about this from the kitchen, not from the office, and ideally before the message arrived.
That's the moment.
Why the existing tools didn't work for me
I tried, in good faith, three different shapes of solution.
Uptime Kuma was where I started. I love Kuma. It's a beautiful little tool. But it's an uptime tool — it pings things, it doesn't see inside the host. I could not have caught the MariaDB OOM with Kuma. That's not its job.
Prometheus + Grafana + node_exporter + Alertmanager is what I ran for about six weeks after that. I made it work. I had dashboards. I felt very serious. I also spent a non-trivial chunk of every weekend keeping it healthy — cardinality explosions when I added a label, scrape configs going stale when I rebuilt a node, Grafana auth being weird, Alertmanager routing rules I had to re-read every time I wanted to change them. For someone running 4,000 hosts in a real SRE org, that surface area pays for itself ten times over. For my homelab and a couple of side projects? It was a second job.
The big cloud monitoring vendors. I priced out Datadog. The math didn't work. By the time I added the agent, APM, log management for the WordPress error log, and a few synthetic checks, the projected monthly bill was higher than the AWS bill it was supposed to be watching. That's not Datadog being evil — that's me genuinely not being the customer they're built for. They're built for companies. I'm a person with a NUC under the desk and a small VPS bill.
So I had three tiers: too shallow (Kuma), too much work (the open-source stack), or too expensive (the cloud vendors). The middle was empty.
What I actually wanted
I wrote it down at one point, and the list was short.
- One install command. Not seven. I didn't want to deploy node_exporter separately from cadvisor separately from mysqld_exporter separately from a postgres exporter. One binary, one curl, every host.
- Real auto-discovery. If MySQL is running on this host, I should not have to tell the agent that MySQL is running on this host. It should look. Same for Docker, Podman, Postgres, libvirt, Proxmox, WordPress.
- Mobile push as a first-class channel. Not "you can configure a webhook to a push gateway." Just push. So I find out from the kitchen, the car, the grocery store — not from the laptop. This was the single biggest piece for me. The whole reason I built it was to find out about my own infrastructure when I wasn't sitting at a desk.
- Per-host pricing, not per-metric. I'm a homelabber. I have 8 hosts. I want to pay for 8 hosts. I do not want to play the metric-cardinality optimization game on a hobby setup.
- Self-hostable, eventually. This one was a "must, but not day one." I was OK using a hosted service to get going as long as the same software could one day run on my own box.
That list is, almost line for line, what shipped.
The principles under the hood
A few things follow directly from that list, and they're what makes the product feel different from the enterprise monitoring suites I'd tried to use.
One Go binary per host. Not a binary plus a sidecar plus a node-level exporter plus a cluster-level exporter. The agent is a single Go binary, around the size you'd expect for a single Go binary, that does all of it. Install it once, it figures out the rest.
Auto-discovery is the actual feature. The "real" version of this means the agent looks at the system, sees what's running, and turns on the right collectors. Docker socket present → container metrics on. MariaDB socket present → database metrics on. Libvirt running → VM state on. WordPress plugin reachable → site-level metrics on. You don't write a config file describing your services. The whole point of running monitoring is to find out what's wrong, and the discovery layer is the thing that makes that not a bookkeeping problem.
Threshold alerts, on purpose. I'm not running anomaly detection on three hosts of homelab traffic. There isn't enough signal to be statistical about it, and the false-positive rate of "ML-driven baselining" at small scale is brutal. I'd rather pick "MariaDB connections above 90% of max for 5 minutes" myself and have it fire when it should and stay silent when it shouldn't. So that's the alerting model. It's deliberate, not a limitation.
Push first. Email second. Webhook third. In that order. Push is for "you should look up from your dinner." Email is for "you should look at this when you get to a computer." Webhook is for "you have your own thing and want to feed alerts into it." That's the entire alerting taxonomy, and it covers the cases I actually have.
Flat per-host pricing. Free tier is 1 host. Solo is $9.90/month for 5. Pro is $29.90/month for 25. There is no per-metric tier and no per-event-ingested tier. This is partly principle and partly self-interest — I didn't want to do cardinality math on my own infra, so I'm not going to make anyone else do it on theirs.
You can read the practical version of all of this — what to monitor, what alerts to set, where to start — in the small-team monitoring playbook. That post is the cornerstone for the "how"; this one is the "why."
What's deliberately NOT in the product
This is the part that's uncomfortable to write but I think is the most useful part of this post.
If you're evaluating Netwarden against an enterprise tool, here is what is honestly missing or out of scope right now:
- No Slack/PagerDuty/Discord/Teams integrations yet. Push, email, and webhook are the channels. If you want Slack today, you point a webhook at Slack's incoming webhook URL — it works, it's not native. Native chat integrations are on the list, but not done.
- No log aggregation. Netwarden is metrics, alerts, and host state. It is not a logging product. If you need centralized logs, run something else for logs.
- No anomaly detection / ML-driven baselining. Threshold-based, on purpose, see above. May add baseline-deviation alerts for users who have enough traffic to make it meaningful, but it is not a roadmap headline.
- No distributed tracing / APM. Out of scope. The product is infrastructure-and-service monitoring, not application performance monitoring.
- No synthetic browser testing or RUM. Different product category.
- No 4,000-host federation features. Designed for 1-100 host operations, primarily. The architecture would handle more, but the use cases above ~100 hosts have priorities I haven't built for.
I'd rather lose the deal where someone needs Slack today than ship a checkbox-quality Slack integration that breaks in subtle ways. Same for the rest of this list.
Where I want it to go
In rough order of how often I think about them:
- Self-hosted single binary, GA. A Bun-based standalone binary is in preview right now. Same UI, same agent, runs on your own box, no Postgres or Redis to administer. This is the version I want for my own homelab — the one where I don't depend on any cloud at all. No firm date, but it's the next big thing.
- Native chat integrations. Slack first, probably Discord next. Not because I love Slack, but because it's where on-call lives for a lot of small teams.
- Better cluster-shaped views for Proxmox and Docker Swarm users. The data is already there from the agent; it's a UI question. There's a Proxmox-specific post that walks through the current state.
- More database engines. MySQL/MariaDB and PostgreSQL today. Beyond that — let the user demand decide which engine is next, rather than guess.
I also want to keep saying no to a long list of features that would make Netwarden into the thing I was trying to escape from. That part is the harder discipline.
The honest pitch
If you're a small team, a solo dev, or a homelabber and you've been bouncing between "Uptime Kuma is too shallow" and "Datadog is too expensive and Prometheus is too much weekend work," I'd genuinely like you to try this.
The free tier covers 1 host, no card. Run curl -sSL get.netwarden.com | bash on whatever box bothers you most, hook it to your phone, and wait a week. If it doesn't earn its keep in that week, walk away — I'd rather hear that than have you stuck on a tool that isn't working for you.
If you specifically care about running it on your own hardware with no cloud dependency, the self-hosted Bun binary preview is the thing to follow. That one's going to take a while to mature, but it's coming.
I built this because I wanted it. If it turns out you wanted something close enough to it, that's the best outcome I could hope for.
— Thiago
Keep reading
- The small-team and homelab monitoring playbook — the practical "how" companion to this "why."
- Alerts that actually page you — the alerting philosophy in detail.
Get More Monitoring Insights
Subscribe to our weekly newsletter for monitoring tips and industry insights.
Related Articles
Self-Hosted Uptime Monitoring: The Honest Pingdom Alternative
Pingdom is $15/month minimum and treats you like a Fortune 500. The free alternatives are real, they're good, and none of them is the right answer for everyone. Here's the honest version.
Sentry alternatives without per-event bills
Sentry's pricing model is reservoir-and-overage. You pay for events. Below: a list of error trackers that don't punish you for being popular, with honest math at 100K/1M/10M events/month.
Self-hosted error tracking that fits in a single binary
Self-hosted Sentry is 23 containers. GlitchTip is 5 services. SigNoz pulls in ClickHouse. What if you just want a single binary, a SQLite file, and a port on your homelab Pi?
Ready for Simple Monitoring?
Stop wrestling with complex monitoring tools. Get started with Netwarden today.