HNPWA with Next.js

Show HN: Homelab Monitoring Setup with Grafana

conor_f | 155 points

I self host for years about 30 services, out of these 3 are vital (bitwarden, home assistant and pihole).

I work in IT, I am a geek so I tried a few monitoring systems and wrote two myself.

Then I realized that I have self-sustaining, 24/7 monitoring agents: wife and children.

I gave up trying to have the right stack and just wait for them to yell.

Seriously: it works great and it made me wonder WHY I am trying to monitor. Turns out this is more for the fun, discovery of tools than a real need at home.

BrandoElFollito | 2 years ago

This confirms to me what I suspected when I was trying to determine whether to host my own Grafana stack or use the Grafana Cloud free tier - that I'd end up spending a ton of time fiddling with a constellation of services I didn't actually care about that I could spend on the projects and services I do care about.

I've not found it too hard to stay within the limits of the free tier. The 10 dashboards limit is the main one that actually constrains me, but I just put more stuff on each dashboard and live with the scrolling. The free retention is not great but it's good enough for my purposes.

sjsdaiuasgdia | 2 years ago

I'm in the process of building out a Grafana stack (Prometheus, Loki, Tempo, Mimir, Grafana) for my day job right now.

...and also for one of my side projects, OSRBeyond.

It's easy to get overwhelmed by all the moving pieces, but it's also a lot of _fun_ to set up.

bovermyer | 2 years ago

I've found VictoriaMetrics all-in-one binary to be perfect size for home at the very least for metrics gathering.

Supports Prometheus querying and few other formats for ingesting so any knowledge bout "how to get data into prometheus" applies pretty much 1:1 + their own vmagent is pretty advanced. Not related to company in any way, just a happy user.

https://victoriametrics.com/

adql | 2 years ago

Hey everyone, this is a post I've been working on the past few months about setting up my own monitoring stack with Grafana for my home server.

I'd love your feedback on how this process could be easier for me, some resources on learning the Grafana query languages, and general comments.

Thanks for taking the time to read + engage!

conor_f | 2 years ago

I have been using Zabbix to monitor my servers for the last years, since I wanted something simple and this Grafana/Prometheus stack always scared me because, as the OP says, of the amount of “moving parts”.

Zabbix has been quite solid and has lots of templates for different servers (linux, windows, etc), triggers and can also monitor docker containers (although i never tried that).

The only thing Zabbix cant do well is log file monitoring, so I am considering something like an ELK stack as an addition.

tacker2000 | 2 years ago

Mildly related: can anyone recommend a time series database that supports easy aggregation by week (with the ability to configure the start of the week) and month? I'm looking for something to switch from InfluxDB which I'm currently using. The linked article is using Prometheus which also doesn't appear to support this functionality.

shrx | 2 years ago

Is there anything easier for logs? Basically glorified ripgrep?

majkinetor | 2 years ago

check out netdata if y'all haven't already - incredible software

whalesalad | 2 years ago

I recently set up packet loss monitoring on a Raspberry Pi, using Prometheus for logging and graphing.

https://video.nstr.no/w/hjTH3Vggn2fvpTrQitMmVP

I would like to set up Grafana and more monitoring as well, on some of my other machines. But for now this is what I have :D

codetrotter | 2 years ago

Shameless plug for AppScope (https://github.com/criblio/appscope) which is designed for exactly this. Capturing observability data from processes in your environment without code modification, and shipping the data off to tools like grafana for monitoring.

czzzzz | 2 years ago

Has anyone had lots of trouble configuring Grafana via YAML from the documentation? A lot of it is kind of hard to follow.

I've found that the ability to (pre)configure Grafana without clicking around in it is pretty difficult.

hardwaresofton | 2 years ago

shameless plug for uptimeFunk (https://uptimefunk.com) that i soft launched a some time ago. I wanted some uptime monitoring with nice ui and a few advanced features that i didn't find anywhere: - monitoring mongo db/replicaset status

- monitoring sql databases with basic sql queries

- monitoring host cpu, ram and disk usage

- monitoring docker containers

- and being able to monitor all of this through ssh tunnels because not all my services are on the internet

guybedo | 2 years ago

We've been using nagios and munin for years, this stack is rock solid. We added recently ELK. This feels overkill, heavyweight and fragile.

shashasha2 | 2 years ago

I went down the Grafana rabbit hole, and without a doubt, it's a fantastic tool. It can handle just about any kind of data you throw at it, and when it comes to visualizing time series data, it's second to none. That said, it's a slog to set up and configure, but once finished, I had a beautiful dashboard for my home media server, and life was good. Unfortunately, a few months later, I was forced to upgrade and lacked the time to reconfigure Grafana. So, as a stopgap, I installed Netdata... fast-forward two years, and today I still haven't reconfigured Grafana, nor do I plan to.

For my use case, a home media server, Netdata turned out to be way simpler to set up, and, most importantly, way less of a hassle/dink-around. It's a basic plug-and-play operation with auto-discovery. While the dashboard isn't nearly as beautiful or configurable, it gets the job done and provides everything I pretty much need or want. It offers a quick overview, historical metrics (over a year of data) to analyze trends or spot potential issues, and push/email notifications if something goes awry.

If you decide to go down this route, there are two major items:

1. You'll need to configure the dbengine[1] database to save and store historical metric data. However, I found the dbengine configuration documentation to be a bit confusing, so I'll spare you the trouble - just use this Jupyter Notebook[2]. If needed, adjust the input, run it, scroll down, and you'll see a summary of the number of days, the maximum dbengine size, and the yaml config, which you can copy, paste, and voila.

2. If you're hoarding data, you'll probably want to set up smartmontools/smartd[3] in a separate Docker container for better disk monitoring metrics. However, I think you can enable hddtemp[4] with Netdata through the config if you don't want or need the extra hassle. You can have Netdata to query this smartd container, but with a handful of disks, it ends up timing out frequently, so I found it's best to simply set up smartd/smartd.conf to log out the smartd data independently. Then all you need to do is tell Netdata where to find the smartd_log[5], and Netdata handles the rest.

Boom, home media server metrics with historical data, done. It still takes a bit of time to set up, but way less than Grafana. Anywho, hopefully, this saves you from wasting as much time as I did. And if you're looking for a smartd reference, shoot me a reply, and I'll tidy up and share my Docker config/scripts and notes.

[1] https://learn.netdata.cloud/docs/typical-netdata-agent-confi... [2] https://colab.research.google.com/github/andrewm4894/netdata... [3] https://www.smartmontools.org/wiki [4] https://github.com/vitlav/hddtemp [5] https://learn.netdata.cloud/docs/data-collection/storage,-mo...

artisin | 2 years ago

Just push to github and people will contribute the rest for you. Easy!

revskill | 2 years ago

With 40 containers I would go kubernetes and with Kube stack you basically have this up and running in 5 minutes.

Aligning metric endpoints for fine-tuning.

Add tracing to it in a few more clicks

Demmme | 2 years ago