Program type BPF_PROG_TYPE_KPROBE
BPF_PROG_TYPE_KPROBE
are eBPF programs that can attach to kprobes. KProbes are not a eBPF specific feature, but they do work very well together. Traditionally, one would have to write a custom kernel module which could be invoked from a kprobe or be content with just the trace log output. eBPF makes this process easier.
Usage
Probes come in 4 different flavors: kprobe
, kretprobe
, uprobe
, and uretprobe
. kprobe
and kretprobe
are used to probe the kernel, uprobe
and uretprobe
are used to probe userspace. The normal probes are invoked when the probed location is executed. The ret
variants will execute once the function returns, allowing for the capture of the return value.
All of these probe types work with the kprobe program type, it is the attach method which determines how the program is executed.
The return value of kprobes programs doesn't do anything.
Context
The context passed to kprobe programs is struct pt_regs
. This structure is different for each CPU architecture since it contains a copy of the CPU registers at the time the kprobe was invoked.
It is common for kprobe programs to use the macros from the Libbpf bpf_tracing.h
header file which defines PT_REGS_PARM1
... PT_REGS_PARM5
as well as a number of others. These macros will translate to the correct field in struct pt_regs
depending on the current architecture. Communicating the architecture you are compiling the BPF program for is done by defining one of the __TARGET_ARCH_*
values in your program or via the command line while compiling.
The same header file also provides the BPF_KPROBE(name, args...)
macro which allows program authors to define the function signatures in the same fashion as the functions they are tracing with type info and all. The macro will cast the correct argument numbers to the given argument names. For example:
SEC("kprobe/proc_sys_write")
int BPF_KPROBE(my_kprobe_example,
struct file* filp, const char* buf,
size_t count, loff_t* ppos) {
...
}
Similar macros also exists for kprobes intended to attach to syscalls: BPF_KSYSCALL(name, args...)
and kretprobes: BPF_KRETPROBE(name, args...)
Attachment
There are two methods of attaching probe programs with variations for uprobes. The "legacy" way involves the manual creation of a k{ret}probe
or u{ret}probe
event via the DebugFS
and then attaching a BPF program to that event via the perf_event_open
syscall.
The newer method uses BPF links to do both the probe event creation and attaching in one for multiple probes. Single probes can, however, still be attached via the perf_event_open
syscall but require different parameters and need to utilize BPF links afterwards.
Legacy kprobe attaching
First step is to create a kprobe or kretprobe trace event. To do so we can use the /sys/kernel/debug
for the purposes of this document.
Existing kprobe events can be listed by printing /sys/kernel/debug/tracing/kprobe_events
. And we can create new events by writing to this pseudo-file. For example executing echo 'p:myprobe do_sys_open' > /sys/kernel/debug/tracing/kprobe_events
will make a new kprobe (p:
) called myprobe
at the do_sys_open
function in the kernel. For details on the full syntax, checkout this link. kretprobes are created by specifying a r:
prefix.
After the probe has been created, a new directory will appear in /sys/kernel/debug/tracing/events/kprobes/
with the same name as we have given our probe, /sys/kernel/debug/tracing/events/kprobes/myprobe
in this case. This directory contains a few pseudo-files, for us id
is important. The contents of /sys/kernel/debug/tracing/events/kprobes/myprobe/id
contains a unique identifier we will need in the next step.
Next step is to open a new perf event using the perf_event_open
syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.size = sizeof(struct perf_event_attr),
.config = kprobe_id, /* The ID of your kprobe */
.sample_period = 1,
.sample_type = PERF_SAMPLE_RAW,
.wakeup_events = 1,
};
syscall(SYS_perf_event_open,
&attr, /* struct perf_event_attr * */
-1, /* pid_t pid */
0 /* int cpu */
-1, /* int group_fd */
PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);
This syscall will return a file descriptor on success. The final step are two ioctl
syscalls to attach our BPF program to the kprobe event and to enable the kprobe.
ioctl(perf_event_fd, PERF_EVENT_IOC_SET_BPF, bpf_prog_fd);
to attach.
ioctl(perf_event_fd, PERF_EVENT_IOC_ENABLE, 0);
to enable.
The kprobe can be temporality disabled with the PERF_EVENT_IOC_DISABLE
ioctl option. Otherwise the kprobe stays attached until the perf_event goes away due to the closing of the perf_event FD or the program exiting. The perf event holds a reference to the BPF program so it will stay loaded until no more kprobes reference it.
Link kprobe attaching
The more modern and preferred way of attaching is using the link create command of the BPF syscall.
For single probes, open a new perf event using the perf_event_open
syscall. Note that the values of the attributes of the perf event structure are a little different here compared to the legacy way.
struct perf_event_attr attr = {
.type = 8; /* read type from /sys/bus/event_source/devices/kprobe/type or uprobe/type */
.sample_type = PERF_SAMPLE_RAW;
.sample_period = 1;
.wakeup_events = 1;
.size = sizeof(attr);
.config |= 1 << 0;
.kprobe_func = ((uint64_t)"symbol_name"); /* symbol name in string, valid names can be found in /proc/kallsyms */
.config1 = ((uint64_t)"symbol_name");
.config2 = 0x0;
.probe_offset = 0x0; /* offset must be a valid instruction, here it is just the start of the kernel symbol*/
};
syscall(SYS_perf_event_open,
&attr, /* struct perf_event_attr * */
-1, /* pid_t pid */
0 /* int cpu */
-1, /* int group_fd */
PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);
After the perf event syscall is successful, the valid file descriptor returned can be used to set the link_create.target_fd
attribute in the bpf structure before the link create command is called.
union bpf_attr attr = {
.link_create.prog_fd = prog_fd; /* valid fd to bpf program of type KPROBE */
.link_create.target_fd = perf_fd; /* valid fd to PMU event */
.link_create.attach_type = BPF_PERF_EVENT;
.link_create.flags = 0;
.link_create.perf_event.bpf_cookie = 0;
};
syscall(SYS_bpf,
BPF_LINK_CREATE,
&attr,
sizeof(attr)
);
union bpf_attr attr = {
attr.link_create.prog_fd = prog_fd;
attr.link_create.attach_type = BPF_TRACE_KPROBE_MULTI;
attr.link_create.kprobe_multi.cnt = sym_count;
attr.link_create.kprobe_multi.cookies = 0;
attr.link_create.kprobe_multi.flags = BPF_F_KPROBE_MULTI_RETURN;
attr.link_create.kprobe_multi.syms = ((uint64_t)sym_name); /* char array of symbol names */
);
syscall(SYS_bpf,
BPF_LINK_CREATE,
&attr,
sizeof(attr)
);
Helper functions
Not all helper functions are available in all program types. These are the helper calls available for socket filter programs:
Supported helper functions
bpf_cgrp_storage_delete
bpf_cgrp_storage_get
bpf_copy_from_user
bpf_copy_from_user_task
bpf_current_task_under_cgroup
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_find_vma
bpf_for_each_map_elem
bpf_get_attach_cookie
bpf_get_branch_snapshot
bpf_get_current_ancestor_cgroup_id
bpf_get_current_cgroup_id
bpf_get_current_comm
bpf_get_current_pid_tgid
bpf_get_current_task
bpf_get_current_task_btf
bpf_get_current_uid_gid
bpf_get_func_ip
bpf_get_ns_current_pid_tgid
bpf_get_numa_node_id
bpf_get_prandom_u32
bpf_get_smp_processor_id
bpf_get_stack
bpf_get_stackid
bpf_get_task_stack
bpf_jiffies64
bpf_kptr_xchg
bpf_ktime_get_boot_ns
bpf_ktime_get_ns
bpf_ktime_get_tai_ns
bpf_loop
bpf_map_delete_elem
bpf_map_lookup_elem
bpf_map_lookup_percpu_elem
bpf_map_peek_elem
bpf_map_pop_elem
bpf_map_push_elem
bpf_map_update_elem
bpf_override_return
bpf_per_cpu_ptr
bpf_perf_event_output
bpf_perf_event_read
bpf_perf_event_read_value
bpf_probe_read
bpf_probe_read_kernel
bpf_probe_read_kernel_str
bpf_probe_read_str
bpf_probe_read_user
bpf_probe_read_user_str
bpf_probe_write_user
bpf_ringbuf_discard
bpf_ringbuf_discard_dynptr
bpf_ringbuf_output
bpf_ringbuf_query
bpf_ringbuf_reserve
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit
bpf_ringbuf_submit_dynptr
bpf_send_signal
bpf_send_signal_thread
bpf_snprintf
bpf_snprintf_btf
bpf_spin_lock
bpf_spin_unlock
bpf_strncmp
bpf_tail_call
bpf_task_pt_regs
bpf_task_storage_delete
bpf_task_storage_get
bpf_this_cpu_ptr
bpf_timer_cancel
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_trace_printk
bpf_trace_vprintk
bpf_user_ringbuf_drain