Program type `BPF_PROG_TYPE_KPROBE`

v4.1

BPF_PROG_TYPE_KPROBE are eBPF programs that can attach to kprobes. KProbes are not a eBPF specific feature, but they do work very well together. Traditionally, one would have to write a custom kernel module which could be invoked from a kprobe or be content with just the trace log output. eBPF makes this process easier.

Usage

Probes come in 5 different flavors: kprobe, kretprobe, uprobe, uretprobe, usdt. kprobe and kretprobe are used to probe the kernel, uprobe and uretprobe are used to probe userspace. The normal probes are invoked when the probed location is executed. The ret variants will execute once the function returns, allowing for the capture of the return value.

The usdt probe is a uprobe in disguise, it expects to attach to tracepoints defined in userspace programs. See the USDT concept page for details.

All of these probe types work with the kprobe program type, it is the attach method which determines how the program is executed.

The return value of kprobes programs doesn't do anything.

Context

The context passed to kprobe programs is struct pt_regs. This structure is different for each CPU architecture since it contains a copy of the CPU registers at the time the kprobe was invoked.

It is common for kprobe programs to use the macros from the Libbpf bpf_tracing.h header file which defines PT_REGS_PARM1 ... PT_REGS_PARM5 as well as a number of others. These macros will translate to the correct field in struct pt_regs depending on the current architecture. Communicating the architecture you are compiling the BPF program for is done by defining one of the __TARGET_ARCH_* values in your program or via the command line while compiling.

The same header file also provides the BPF_KPROBE(name, args...) macro which allows program authors to define the function signatures in the same fashion as the functions they are tracing with type info and all. The macro will cast the correct argument numbers to the given argument names. For example:

SEC("kprobe/proc_sys_write")
int BPF_KPROBE(my_kprobe_example,
           struct file* filp, const char* buf,
           size_t count, loff_t* ppos) {
    ...
}

Similar macros also exists for kprobes intended to attach to syscalls: BPF_KSYSCALL(name, args...) and kretprobes: BPF_KRETPROBE(name, args...)

Attachment

There are two methods of attaching probe programs with variations for uprobes. The "legacy" way involves the manual creation of a k{ret}probe or u{ret}probe event via the DebugFS and then attaching a BPF program to that event via the perf_event_open syscall.

The newer method uses BPF links to do both the probe event creation and attaching in one for multiple probes. Single probes can, however, still be attached via the perf_event_open syscall but require different parameters and need to utilize BPF links afterwards.

Legacy kprobe attaching

First step is to create a kprobe or kretprobe trace event. To do so we can use the DebugFS, which we will assume is mounted at /sys/kernel/debug for the purposes of this document.

Existing kprobe events can be listed by printing /sys/kernel/debug/tracing/kprobe_events. And we can create new events by writing to this pseudo-file. For example executing echo 'p:myprobe do_sys_open' > /sys/kernel/debug/tracing/kprobe_events will make a new kprobe (p:) called myprobe at the do_sys_open function in the kernel. For details on the full syntax, checkout this link. kretprobes are created by specifying a r: prefix.

After the probe has been created, a new directory will appear in /sys/kernel/debug/tracing/events/kprobes/ with the same name as we have given our probe, /sys/kernel/debug/tracing/events/kprobes/myprobe in this case. This directory contains a few pseudo-files, for us id is important. The contents of /sys/kernel/debug/tracing/events/kprobes/myprobe/id contains a unique identifier we will need in the next step.

Next step is to open a new perf event using the perf_event_open syscall:

struct perf_event_attr attr = {
    .type = PERF_TYPE_TRACEPOINT,
    .size = sizeof(struct perf_event_attr),
    .config = kprobe_id, /* The ID of your kprobe */
    .sample_period = 1,
    .sample_type = PERF_SAMPLE_RAW,
    .wakeup_events = 1,
};

syscall(SYS_perf_event_open, 
    &attr,  /* struct perf_event_attr * */
    -1,     /* pid_t pid */
    0       /* int cpu */
    -1,     /* int group_fd */
    PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);

This syscall will return a file descriptor on success. The final step are two ioctl syscalls to attach our BPF program to the kprobe event and to enable the kprobe.

ioctl(perf_event_fd, PERF_EVENT_IOC_SET_BPF, bpf_prog_fd); to attach.

ioctl(perf_event_fd, PERF_EVENT_IOC_ENABLE, 0); to enable.

The kprobe can be temporality disabled with the PERF_EVENT_IOC_DISABLE ioctl option. Otherwise the kprobe stays attached until the perf_event goes away due to the closing of the perf_event FD or the program exiting. The perf event holds a reference to the BPF program so it will stay loaded until no more kprobes reference it.

Link kprobe attaching

The more modern and preferred way of attaching is using the link create command of the BPF syscall. For single probes, open a new perf event using the perf_event_open syscall. Note that the values of the attributes of the perf event structure are a little different here compared to the legacy way.

 struct perf_event_attr attr = {
    .type = 8; /* read type from /sys/bus/event_source/devices/kprobe/type or uprobe/type */
    .sample_type = PERF_SAMPLE_RAW;
    .sample_period = 1;
    .wakeup_events = 1;
    .size = sizeof(attr);
    .config |= 1 << 0;
    .kprobe_func = ((uint64_t)"symbol_name");  /* symbol name in string, valid names can be found in /proc/kallsyms */
    .config1 = ((uint64_t)"symbol_name"); 
    .config2 = 0x0;
    .probe_offset = 0x0; /* offset must be a valid instruction, here it is just the start of the kernel symbol*/
};

syscall(SYS_perf_event_open, 
    &attr,  /* struct perf_event_attr * */
    -1,     /* pid_t pid */
    0       /* int cpu */
    -1,     /* int group_fd */
    PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);

After the perf event syscall is successful, the valid file descriptor returned can be used to set the link_create.target_fd attribute in the bpf structure before the link create command is called.

union bpf_attr attr = {
    .link_create.prog_fd = prog_fd; /* valid fd to bpf program of type KPROBE */
    .link_create.target_fd = perf_fd; /* valid fd to PMU event */
    .link_create.attach_type = BPF_PERF_EVENT;
    .link_create.flags = 0;
    .link_create.perf_event.bpf_cookie = 0;
};

syscall(SYS_bpf,
    BPF_LINK_CREATE,
    &attr,
    sizeof(attr)
);

For multiple probes, link create command with the BPF_TRACE_KPROBE_MULTI expected attach type, can be used to combine the creation and linking of the probes. Fprobes are used under the hood for multiple kprobes.

union bpf_attr attr = {
    .link_create.prog_fd = prog_fd;
    .link_create.attach_type = BPF_TRACE_KPROBE_MULTI;
    .link_create.kprobe_multi.cnt = sym_count; 
    .link_create.kprobe_multi.cookies = 0;
    .link_create.kprobe_multi.flags = BPF_F_KPROBE_MULTI_RETURN;
    .link_create.kprobe_multi.syms = ((uint64_t)sym_name); /* char array of symbol names */
);

syscall(SYS_bpf,
    BPF_LINK_CREATE,
    &attr,
    sizeof(attr)
);

Attaching to a session is possible through the attachment type BPF_TRACE_KPROBE_SESSION. In this approach, a single link attaches the BPF program to both the function entry and return probes. Combining both entry and exit probes into one session attachment simplifies the overall code structure and maintainability. Traditionally for many tools, such as Tetragon and bpftrace, an extra entry probe is created solely for setting up the return probe, but with session links, this extra execution is omitted.

union bpf_attr attr = {
    .link_create.prog_fd = prog_fd;
    .link_create.attach_type = BPF_TRACE_KPROBE_SESSION;
};

syscall(SYS_bpf,
    BPF_LINK_CREATE,
    &attr,
    sizeof(attr)
);

Helper functions

Not all helper functions are available in all program types. These are the helper calls available for kprobe programs:

Supported helper functions

KFuncs

Supported kfuncs

Program type BPF_PROG_TYPE_KPROBE