Program type BPF_PROG_TYPE_TRACEPOINT
BPF_PROG_TYPE_TRACEPOINT
programs are eBPF programs that attach to pre-defined trace points in the linux kernel. These tracepoint are often placed in locations which are interesting or common locations to measure performance.
Usage
Tracepoint programs can attach to trace events. These events are declared with the TRACE_EVENT
macro. Take for example the xdp_exception
trace event. With a combination of TP_*
macros a function prototype for the tracepoint is defined, a structure which will be passed to any handlers and a conversion method for going from the arguments to the structure.
The TRACE_EVENT
macro will make a tracepoint available via a function with the trace_
prefix followed by the name. So trace_xdp_exception
will fire the xdp_exception
event, which can happen from any number of locations in the code. The attached eBPF program will be called for all invocations of the trace program.
We can use the tracefs
to list all of these available trace events. For the sake of this page we will assume the tracefs
is mounted at /sys/kernel/tracing
(which is usual for most distros). The /sys/kernel/tracing/events/
directory contains a number of yet more directories. The events are grouped by the first word in their name, so all kvm_*
events reside in /sys/kernel/tracing/events/kvm
. So xdp_exception
is located in /sys/kernel/tracing/events/xdp/xdp_exception
. We will refer to this directory as the "event directory".
Context
The context for a tracepoint program is a pointer to a structure, the type of which is different for each trace event. The event directory contains a pseudo-file called format
so for xdp_exception
that would be /sys/kernel/tracing/events/xdp/xdp_exception/format
. We can read this file to get the layout of the struct type:
$ cat /sys/kernel/tracing/events/xdp/xdp_exception/format
name: xdp_exception
ID: 488
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int prog_id; offset:8; size:4; signed:1;
field:u32 act; offset:12; size:4; signed:0;
field:int ifindex; offset:16; size:4; signed:1;
print fmt: "prog_id=%d action=%s ifindex=%d", REC->prog_id, __print_symbolic(REC->act, { 0, "ABORTED" }, { 1, "DROP" }, { 2, "PASS" }, { 3, "TX" }, { 4, "REDIRECT" }, { -1, ((void *)0) }), REC->ifindex
From this output we can reconstruct the context, which as C struct would look like:
struct xdp_exception_ctx {
__u16 common_type;
__u8 flags;
__u8 common_preempt_count;
__s32 common_pid;
__s32 prog_int;
__u32 act;
__s32 ifindex;
};
Attachment
There are three methods of attaching tracepoint programs, from oldest and least recommended to newest and most recommended, however, all methods have this first part in common.
We start by looking up the event ID in the tracefs
. Inside the event directory is located a pseudo-file called id
, so for xdp_exception
that would be /sys/kernel/tracing/events/xdp/xdp_exception/id
. When reading the file a decimal number is returned.
Next step is to open a new perf event using the perf_event_open
syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.size = sizeof(struct perf_event_attr),
.config = event_id, /* The ID of your trace event */
.sample_period = 1,
.sample_type = PERF_SAMPLE_RAW,
.wakeup_events = 1,
};
syscall(SYS_perf_event_open,
&attr, /* struct perf_event_attr * */
-1, /* pid_t pid */
0 /* int cpu */
-1, /* int group_fd */
PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);
This syscall will return a file descriptor on success.
ioctl method
This is the oldest and least recommended method. After we have the perf event file descriptor we execute two ioctl
syscalls to attach our BPF program to the trace event and to enable the trace.
ioctl(perf_event_fd, PERF_EVENT_IOC_SET_BPF, bpf_prog_fd);
to attach.
ioctl(perf_event_fd, PERF_EVENT_IOC_ENABLE, 0);
to enable.
The tracepoint can be temporality disabled with the PERF_EVENT_IOC_DISABLE
ioctl option. Otherwise the tracepoint stays attached until the perf_event goes away due to the closing of the perf_event FD or the program exiting. The perf event holds a reference to the BPF program so it will stay loaded until no more tracepoint reference it.
perf_event_open
PMU
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
BPF link
This is the newest and most recommended method of attaching tracepoint programs.
After we have gotten the perf event file descriptor we attach the program by making a bpf link via the link create syscall command.
We call the syscall command with the BPF_PERF_EVENT
attach_type
, target_fd
set to the perf event file descriptor, prog_fd
to the file descriptor of the tracepoint program, and optionally a cookie
Helper functions
Supported helper functions
bpf_perf_event_output
bpf_get_stackid
bpf_get_stack
bpf_get_attach_cookie
bpf_map_lookup_elem
bpf_map_update_elem
bpf_map_delete_elem
bpf_map_push_elem
bpf_map_pop_elem
bpf_map_peek_elem
bpf_map_lookup_percpu_elem
bpf_ktime_get_ns
bpf_ktime_get_boot_ns
bpf_tail_call
bpf_get_current_pid_tgid
bpf_get_current_task
bpf_get_current_task_btf
bpf_task_pt_regs
bpf_get_current_uid_gid
bpf_get_current_comm
bpf_trace_printk
bpf_get_smp_processor_id
bpf_get_numa_node_id
bpf_perf_event_read
bpf_current_task_under_cgroup
bpf_get_prandom_u32
bpf_probe_write_user
bpf_probe_read_user
bpf_probe_read_kernel
bpf_probe_read_user_str
bpf_probe_read_kernel_str
bpf_probe_read
bpf_probe_read_str
bpf_get_current_cgroup_id
bpf_get_current_ancestor_cgroup_id
bpf_send_signal
bpf_send_signal_thread
bpf_perf_event_read_value
bpf_get_ns_current_pid_tgid
bpf_ringbuf_output
bpf_ringbuf_reserve
bpf_ringbuf_submit
bpf_ringbuf_discard
bpf_ringbuf_query
bpf_jiffies64
bpf_get_task_stack
bpf_copy_from_user
bpf_copy_from_user_task
bpf_snprintf_btf
bpf_per_cpu_ptr
bpf_this_cpu_ptr
bpf_task_storage_get
bpf_task_storage_delete
bpf_for_each_map_elem
bpf_snprintf
bpf_get_func_ip
bpf_get_branch_snapshot
bpf_find_vma
bpf_trace_vprintk
bpf_map_lookup_elem
bpf_map_update_elem
bpf_map_delete_elem
bpf_map_push_elem
bpf_map_pop_elem
bpf_map_peek_elem
bpf_map_lookup_percpu_elem
bpf_get_prandom_u32
bpf_get_smp_processor_id
bpf_get_numa_node_id
bpf_tail_call
bpf_ktime_get_ns
bpf_ktime_get_boot_ns
bpf_ringbuf_output
bpf_ringbuf_reserve
bpf_ringbuf_submit
bpf_ringbuf_discard
bpf_ringbuf_query
bpf_for_each_map_elem
bpf_loop
bpf_strncmp
bpf_spin_lock
bpf_spin_unlock
bpf_jiffies64
bpf_per_cpu_ptr
bpf_this_cpu_ptr
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_timer_cancel
bpf_trace_printk
bpf_get_current_task
bpf_get_current_task_btf
bpf_probe_read_user
bpf_probe_read_kernel
bpf_probe_read_user_str
bpf_probe_read_kernel_str
bpf_snprintf_btf
bpf_snprintf
bpf_task_pt_regs
bpf_trace_vprintk
bpf_cgrp_storage_get
bpf_cgrp_storage_delete
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_kptr_xchg
bpf_ktime_get_tai_ns
bpf_ringbuf_discard_dynptr
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit_dynptr
bpf_user_ringbuf_drain
KFuncs
There are currently no kfuncs supported for this program type