Program type BPF_PROG_TYPE_TRACEPOINT
BPF_PROG_TYPE_TRACEPOINT
programs are eBPF programs that attach to pre-defined trace points in the linux kernel. These tracepoint are often placed in locations which are interesting or common locations to measure performance.
Usage
Tracepoint programs can attach to trace events. These events are declared with the TRACE_EVENT
macro. Take for example the xdp_exception
trace event. With a combination of TP_*
macros a function prototype for the tracepoint is defined, a structure which will be passed to any handlers and a conversion method for going from the arguments to the structure.
The TRACE_EVENT
macro will make a tracepoint available via a function with the trace_
prefix followed by the name. So trace_xdp_exception
will fire the xdp_exception
event, which can happen from any number of locations in the code. The attached eBPF program will be called for all invocations of the trace program.
We can use the tracefs
to list all of these available trace events. For the sake of this page we will assume the tracefs
is mounted at /sys/kernel/tracing
(which is usual for most distros). The /sys/kernel/tracing/events/
directory contains a number of yet more directories. The events are grouped by the first word in their name, so all kvm_*
events reside in /sys/kernel/tracing/events/kvm
. So xdp_exception
is located in /sys/kernel/tracing/events/xdp/xdp_exception
. We will refer to this directory as the "event directory".
Context
The context for a tracepoint program is a pointer to a structure, the type of which is different for each trace event. The event directory contains a pseudo-file called format
so for xdp_exception
that would be /sys/kernel/tracing/events/xdp/xdp_exception/format
. We can read this file to get the layout of the struct type:
$ cat /sys/kernel/tracing/events/xdp/xdp_exception/format
name: xdp_exception
ID: 488
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int prog_id; offset:8; size:4; signed:1;
field:u32 act; offset:12; size:4; signed:0;
field:int ifindex; offset:16; size:4; signed:1;
print fmt: "prog_id=%d action=%s ifindex=%d", REC->prog_id, __print_symbolic(REC->act, { 0, "ABORTED" }, { 1, "DROP" }, { 2, "PASS" }, { 3, "TX" }, { 4, "REDIRECT" }, { -1, ((void *)0) }), REC->ifindex
From this output we can reconstruct the context, which as C struct would look like:
struct xdp_exception_ctx {
__u16 common_type;
__u8 flags;
__u8 common_preempt_count;
__s32 common_pid;
__s32 prog_int;
__u32 act;
__s32 ifindex;
};
Attachment
There are three methods of attaching tracepoint programs, from oldest and least recommended to newest and most recommended, however, all methods have this first part in common.
We start by looking up the event ID in the tracefs
. Inside the event directory is located a pseudo-file called id
, so for xdp_exception
that would be /sys/kernel/tracing/events/xdp/xdp_exception/id
. When reading the file a decimal number is returned.
Next step is to open a new perf event using the perf_event_open
syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.size = sizeof(struct perf_event_attr),
.config = event_id, /* The ID of your trace event */
.sample_period = 1,
.sample_type = PERF_SAMPLE_RAW,
.wakeup_events = 1,
};
syscall(SYS_perf_event_open,
&attr, /* struct perf_event_attr * */
-1, /* pid_t pid */
0 /* int cpu */
-1, /* int group_fd */
PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);
This syscall will return a file descriptor on success.
ioctl method
This is the oldest and least recommended method. After we have the perf event file descriptor we execute two ioctl
syscalls to attach our BPF program to the trace event and to enable the trace.
ioctl(perf_event_fd, PERF_EVENT_IOC_SET_BPF, bpf_prog_fd);
to attach.
ioctl(perf_event_fd, PERF_EVENT_IOC_ENABLE, 0);
to enable.
The tracepoint can be temporality disabled with the PERF_EVENT_IOC_DISABLE
ioctl option. Otherwise the tracepoint stays attached until the perf_event goes away due to the closing of the perf_event FD or the program exiting. The perf event holds a reference to the BPF program so it will stay loaded until no more tracepoint reference it.
perf_event_open
PMU
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
BPF link
This is the newest and most recommended method of attaching tracepoint programs.
After we have gotten the perf event file descriptor we attach the program by making a bpf link via the link create syscall command.
We call the syscall command with the BPF_PERF_EVENT
attach_type
, target_fd
set to the perf event file descriptor, prog_fd
to the file descriptor of the tracepoint program, and optionally a cookie
Helper functions
Supported helper functions
bpf_cgrp_storage_delete
bpf_cgrp_storage_get
bpf_copy_from_user
bpf_copy_from_user_task
bpf_current_task_under_cgroup
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_find_vma
bpf_for_each_map_elem
bpf_get_attach_cookie
bpf_get_branch_snapshot
bpf_get_current_ancestor_cgroup_id
bpf_get_current_cgroup_id
bpf_get_current_comm
bpf_get_current_pid_tgid
bpf_get_current_task
bpf_get_current_task_btf
bpf_get_current_uid_gid
bpf_get_func_ip
bpf_get_ns_current_pid_tgid
bpf_get_numa_node_id
bpf_get_prandom_u32
bpf_get_smp_processor_id
bpf_get_stack
bpf_get_stackid
bpf_get_task_stack
bpf_jiffies64
bpf_kptr_xchg
bpf_ktime_get_boot_ns
bpf_ktime_get_ns
bpf_ktime_get_tai_ns
bpf_loop
bpf_map_delete_elem
bpf_map_lookup_elem
bpf_map_lookup_percpu_elem
bpf_map_peek_elem
bpf_map_pop_elem
bpf_map_push_elem
bpf_map_update_elem
bpf_per_cpu_ptr
bpf_perf_event_output
bpf_perf_event_read
bpf_perf_event_read_value
bpf_probe_read
bpf_probe_read_kernel
bpf_probe_read_kernel_str
bpf_probe_read_str
bpf_probe_read_user
bpf_probe_read_user_str
bpf_probe_write_user
bpf_ringbuf_discard
bpf_ringbuf_discard_dynptr
bpf_ringbuf_output
bpf_ringbuf_query
bpf_ringbuf_reserve
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit
bpf_ringbuf_submit_dynptr
bpf_send_signal
bpf_send_signal_thread
bpf_snprintf
bpf_snprintf_btf
bpf_spin_lock
bpf_spin_unlock
bpf_strncmp
bpf_tail_call
bpf_task_pt_regs
bpf_task_storage_delete
bpf_task_storage_get
bpf_this_cpu_ptr
bpf_timer_cancel
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_trace_printk
bpf_trace_vprintk
bpf_user_ringbuf_drain
KFuncs
Supported kfuncs
bpf_arena_alloc_pages
v6.12 -bpf_arena_free_pages
v6.12 -bpf_cast_to_kern_ctx
v6.12 -bpf_cgroup_acquire
v6.12 -bpf_cgroup_ancestor
v6.12 -bpf_cgroup_from_id
v6.12 -bpf_cgroup_release
v6.12 -bpf_copy_from_user_str
v6.12 -bpf_cpumask_acquire
v6.12 -bpf_cpumask_and
v6.12 -bpf_cpumask_any_and_distribute
v6.12 -bpf_cpumask_any_distribute
v6.12 -bpf_cpumask_clear
v6.12 -bpf_cpumask_clear_cpu
v6.12 -bpf_cpumask_copy
v6.12 -bpf_cpumask_create
v6.12 -bpf_cpumask_empty
v6.12 -bpf_cpumask_equal
v6.12 -bpf_cpumask_first
v6.12 -bpf_cpumask_first_and
v6.12 -bpf_cpumask_first_zero
v6.12 -bpf_cpumask_full
v6.12 -bpf_cpumask_intersects
v6.12 -bpf_cpumask_or
v6.12 -bpf_cpumask_release
v6.12 -bpf_cpumask_set_cpu
v6.12 -bpf_cpumask_setall
v6.12 -bpf_cpumask_subset
v6.12 -bpf_cpumask_test_and_clear_cpu
v6.12 -bpf_cpumask_test_and_set_cpu
v6.12 -bpf_cpumask_test_cpu
v6.12 -bpf_cpumask_weight
v6.12 -bpf_cpumask_xor
v6.12 -bpf_dynptr_adjust
v6.12 -bpf_dynptr_clone
v6.12 -bpf_dynptr_from_skb
v6.12 -bpf_dynptr_is_null
v6.12 -bpf_dynptr_is_rdonly
v6.12 -bpf_dynptr_size
v6.12 -bpf_dynptr_slice
v6.12 -bpf_dynptr_slice_rdwr
v6.12 -bpf_get_dentry_xattr
v6.12 -bpf_get_file_xattr
v6.12 -bpf_get_fsverity_digest
v6.12 -bpf_get_kmem_cache
v6.12 -bpf_get_task_exe_file
v6.12 -bpf_iter_bits_destroy
v6.12 -bpf_iter_bits_new
v6.12 -bpf_iter_bits_next
v6.12 -bpf_iter_css_destroy
v6.12 -bpf_iter_css_new
v6.12 -bpf_iter_css_next
v6.12 -bpf_iter_css_task_destroy
v6.12 -bpf_iter_css_task_new
v6.12 -bpf_iter_css_task_next
v6.12 -bpf_iter_kmem_cache_destroy
v6.12 -bpf_iter_kmem_cache_new
v6.12 -bpf_iter_kmem_cache_next
v6.12 -bpf_iter_num_destroy
v6.12 -bpf_iter_num_new
v6.12 -bpf_iter_num_next
v6.12 -bpf_iter_scx_dsq_destroy
v6.12 -bpf_iter_scx_dsq_new
v6.12 -bpf_iter_scx_dsq_next
v6.12 -bpf_iter_task_destroy
v6.12 -bpf_iter_task_new
v6.12 -bpf_iter_task_next
v6.12 -bpf_iter_task_vma_destroy
v6.12 -bpf_iter_task_vma_new
v6.12 -bpf_iter_task_vma_next
v6.12 -bpf_key_put
v6.12 -bpf_list_pop_back
v6.12 -bpf_list_pop_front
v6.12 -bpf_list_push_back_impl
v6.12 -bpf_list_push_front_impl
v6.12 -bpf_local_irq_restore
v6.12 -bpf_local_irq_save
v6.12 -bpf_lookup_system_key
v6.12 -bpf_lookup_user_key
v6.12 -bpf_map_sum_elem_count
v6.12 -bpf_obj_drop_impl
v6.12 -bpf_obj_new_impl
v6.12 -bpf_path_d_path
v6.12 -bpf_percpu_obj_drop_impl
v6.12 -bpf_percpu_obj_new_impl
v6.12 -bpf_preempt_disable
v6.12 -bpf_preempt_enable
v6.12 -bpf_put_file
v6.12 -bpf_rbtree_add_impl
v6.12 -bpf_rbtree_first
v6.12 -bpf_rbtree_remove
v6.12 -bpf_rcu_read_lock
v6.12 -bpf_rcu_read_unlock
v6.12 -bpf_rdonly_cast
v6.12 -bpf_refcount_acquire_impl
v6.12 -bpf_send_signal_task
v6.12 -bpf_sock_destroy
v6.12 -bpf_task_acquire
v6.12 -bpf_task_from_pid
v6.12 -bpf_task_from_vpid
v6.12 -bpf_task_get_cgroup1
v6.12 -bpf_task_release
v6.12 -bpf_task_under_cgroup
v6.12 -bpf_throw
v6.12 -bpf_verify_pkcs7_signature
v6.12 -bpf_wq_init
v6.12 -bpf_wq_set_callback_impl
v6.12 -bpf_wq_start
v6.12 -cgroup_rstat_flush
v6.12 -cgroup_rstat_updated
v6.12 -crash_kexec
v6.12 -scx_bpf_cpu_rq
v6.12 -scx_bpf_cpuperf_cap
v6.12 -scx_bpf_cpuperf_cur
v6.12 -scx_bpf_cpuperf_set
v6.12 -scx_bpf_destroy_dsq
v6.12 -scx_bpf_dsq_nr_queued
v6.12 -scx_bpf_dump_bstr
v6.12 -scx_bpf_error_bstr
v6.12 -scx_bpf_exit_bstr
v6.12 -scx_bpf_get_idle_cpumask
v6.12 -scx_bpf_get_idle_smtmask
v6.12 -scx_bpf_get_online_cpumask
v6.12 -scx_bpf_get_possible_cpumask
v6.12 -scx_bpf_kick_cpu
v6.12 -scx_bpf_now
v6.12 -scx_bpf_nr_cpu_ids
v6.12 -scx_bpf_pick_any_cpu
v6.12 -scx_bpf_pick_idle_cpu
v6.12 -scx_bpf_put_cpumask
v6.12 -scx_bpf_put_idle_cpumask
v6.12 -scx_bpf_task_cgroup
v6.12 -scx_bpf_task_cpu
v6.12 -scx_bpf_task_running
v6.12 -scx_bpf_test_and_clear_cpu_idle
v6.12 -