Your logging is probably down

Ek-Hou-Van-Braai@piefed.social · 1 day ago

Your logging is probably down

Possibly linux@lemmy.zip · 22 hours ago

You need monitoring

wizardbeard@lemmy.dbzer0.com · 21 hours ago

I’m remembering a very not fun discussion my team had about “the monitoring system not sending any alerts doesn’t inherently mean everything is ok” after an outage that was missed by our monitoring system.

You need to make sure you’re monitoring connectivity as well as specific problem states. No data is a problem state often overlooked, and it’s not always considered for every resource type in these systems out of the box.

And you probably want a heartbeat notification. Yes, it’s noise, but if you don’t see anything from monitoring you need to question if monitoring is the thing that broke. It sending out a notification every so often going “yes I am online” is useful.

shane@feddit.nl · 21 hours ago

One alert daily reporting that there are no alerts is probably good for a home lab…

jaschen306@sh.itjust.works · 21 hours ago

Kubernetes? New Relic?