When we talk about observability, we naturally associate the terms monitoring, metrics, logging, etc. with it. Wikipedia defines observability this way:
“Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.”
In fact, we can easily understand the meaning of observability, which is to make the system from black-box to white-box, so that we can know and evaluate the internal work state as much as possible during the runtime of the system.
For Linux observability, we can usually combine the following but not limited to the following to measure:
- Monitoring systems: obtaining metrics information on a regular basis, e.g. zabbix/prometheus;
- Logging systems: storing and filtering log contents, e.g. rsyslog/ELK/loki;
- Tracing systems: event-based logging of information, e.g. strace/ltrace/opentracing;
- Sampling system: in a statistical point of view, collecting a set of sample information when the system is running to roughly derive the operational status of the system, such as time-based sampling, e.g. sysstat/perf/top;
We may be reminded of a classic diagram:
This diagram contains observability tools for different modules in the system, many of which are frequently used in our daily operations work. Today, I don’t want to discuss these tools one by one, but rather I want to point out the limitations of some of the regular tools; in some scenarios they are not enough to help us find the root cause of a problem, but rather just give us a general direction. For example, in a failure scenario, we must have insight into the runtime state of a kernel function or a user space function in order to find the root cause of the failure. This requires fine-grained dynamic tracing and sampling for both the user space and the kernel space.
It’s time to make an appearance of eBPF, our main character today.
In my opinion, it is one of the most popular technologies in Linux kernel in recent years. You can search for eBPF in any internet search engine using any language and you will get tons of articles. So, based on that, I won’t go into a detailed description of what eBPF is and what eBPF can do. Instead, I’d like to summarize it in one succinct sentence:
eBPF is a programming framework that runs in the kernel space, and contains a kernel space virtual machine that allows custom programs to run as bytecode, thus tracing and sampling both user space and kernel space, and even modifying the logic in both user space and kernel space.
As to how to use eBPF, for the scenario of observability discussed in this article, in brief, the following steps are required:
- In-depth understanding of the kernel’s tracing mechanism, and in-depth understanding of user space programs and kernel logic to know which user space functions or kernel functions should be observed in different scenarios;
- In-depth understanding of the mechanism of eBPF framework, you will write eBPF program based on this framework in the future so as to observe the user program and the kernel;
- In order to simplify the complexity of writing eBPF programs, eBPF programs are written with the support of user space eBPF programming frameworks such as bcc or libbpf: an eBPF program is divided into two parts, one part is executed as bytecode in the kernel space and is written in C, which is ultimately compiled by clang/llvm into eBPF bytecode, and the other part is executed in the user space and is written in C/python/lua/C++/go.
It looks like eBPF as a technology is not easy to use; it is not a simple command, executing, and then viewing the results.
So for ops engineers, are there an off-the-shelf eBPF programs that they can just run and get the results? Can eBPF be more friendly to ops engineers? Of course! The bcc is such an existence!
BCC (BPF Compiler Collection) is the first user space programming framework developed for eBPF. It not only allows developers to develop eBPF programs, but also contains many eBPF programs that can be executed directly. These programs cover the user space and all kernel subsystems.
The following diagram is the best summary of the bcc tool:
As we can see from the diagram, these eBPF programs are capable of observing user space programs written in multiple languages, as well as kernel space filesystems, IO subsystems, network subsystems, scheduling subsystems, memory subsystems, and so on.
I will gradually share more about the use of bcc tools in future posts.