Kube Cloud Pt4 | Observabilityfull course
In this course, I’m going to walk through how to implement the 3 pillars of observability on our example application using the logz.io platform. Before we do that, I’m going to explain the 3 pillars of observability, which are:
Logz.io uses ELK for their log management platform. There’s not much more to say here that logz.io won’t say better and in more detail. But I will touch on my philosophy here:
Logging is the foundation of your application observability though and it should be very familiar to most developers since your logs are usually immediately visible in your console and you’re using them to see if your application is behaving the way you expect and debug if it isn’t. You can do so much from the logs though so they are very important. For example, you can scrape together a makeshift tracing if you need to (but tracing tools are better), you can parse logs to generate metrics if you need to (but metrics frameworks are better). You face a balancing act with logs: you can’t see what isn’t logged, so you almost should log everything possible, but storing logs over time can be expensive and sifting through noisy logs is tedious. My recommendation is to log so much that you feel like you might overdo it with the logs and use log level configuration to hide what isn’t needed.
Ideally, you should log
- Every input to every method
- Every output from every method
- Every decision outcome
- Every exception
- Including as much information as you can about the inputs that lead to the exception
- The full stacktrace
This is a lot, but again, if you have the logs you can bootstrap another observability solution as needed. If you don’t have the logs, you’re pretty much guessing.
Tracing is often overlooked but is a critical component to debugging a cloud based distributed application. You must trace every request across all of your services (IMHO). Your tracing has to propagate across REST calls, message consumers, stateless functions, etc… so that you have full visibility of the calls through your system.
Tracing not only allows you to debug, but it can be very useful in performance analysis. Tracing can show you where slow calls are being made, where unnecessary or duplicate calls are being made and missing traces can show you where your observability picture is not complete.
Metrics is often misunderstood, but datadog has an excellent series of articles on monitoring and they clearly explain how metrics are a key piece of any observability platform.
That being said, engineers often focus on the ‘resource’ metrics: errors, latency, throughput, etc… and while these are very important, the ‘work’ (or business metrics) are just as important to tell you how your system is actually doing the work it was built to do. Because they take some extra work to implement and careful thought about what is important to collect, they can be neglected.
We’re going to implement the 3 pillars of observability in our application so that you know their value and how to leverage them in your projects. Let’s get started.