BPF Syscall BPF_PROG_LOAD
command
The BPF_PROG_LOAD
command loads a program into the kernel.
Return type
This command will return the file descriptor of the program (positive integer) or an error number (negative integer) if the program wasn't loaded for whatever reason.
Attributes
union bpf_attr {
struct {
__u32 prog_type;
__u32 insn_cnt;
__aligned_u64 insns;
__aligned_u64 license;
__u32 log_level;
__u32 log_size;
__aligned_u64 log_buf;
__u32 kern_version;
__u32 prog_flags;
char prog_name[BPF_OBJ_NAME_LEN];
__u32 prog_ifindex;
__u32 expected_attach_type;
__u32 prog_btf_fd;
__u32 func_info_rec_size;
__aligned_u64 func_info;
__u32 func_info_cnt;
__u32 line_info_rec_size;
__aligned_u64 line_info;
__u32 line_info_cnt;
__u32 attach_btf_id;
union {
__u32 attach_prog_fd;
__u32 attach_btf_obj_fd;
};
__u32 core_relo_cnt;
__aligned_u64 fd_array;
__aligned_u64 core_relos;
__u32 core_relo_rec_size;
__u32 log_true_size;
__s32 prog_token_fd;
__u32 fd_array_cnt;
};
}
prog_type
This attribute specifies the type of the program to be loaded and must be one of the types defined in program types.
insn_cnt
This attribute specifies the number of eBPF instructions which are passed to insns
. This is used to know how much memory to read so it must be correctly sized. If only the amount of bytes is known, one can simply divided by 8
since every eBPF instruction is 8 bytes wide.
insns
This attributes specifies the actual eBPF instructions of the program to be loaded. It should be a pointer to memory containing the instructions. The size of this blob is indicated by insn_cnt
.
license
This attribute specifies the software license of the eBPF program to be loaded. It should be a pointer to a null-terminated string containing the human readable license. For example GPL
, MIT
or Proprietary
.
A number of helper functions in the kernel are GPL-licensed and may only be called from "GPL compatible" programs. The following license strings are recognized as "GPL compatible":
- GPL
- GPL v2
- GPL and additional rights
- Dual BSD/GPL
- Dual MIT/GPL
- Dual MPL/GPL
log_level
The lower 2 bits of this value are the log level:
0
= no log1
= basic logging (BPF_LOG_LEVEL1
)2
= verbose logging (BPF_LOG_LEVEL2
)
The remaining bits are flags:
-
1 << 3
(BPF_LOG_STATS
) If set the kernel will output statistics to the log. Flags can be used since v5.2 -
1 << 4
(BPF_LOG_FIXED
) since v6.4, the verifier log rotates instead of truncating. Whenlog_size
is exceeded. Setting this flag preserves the old behavior of truncating the log tolog_size
bytes.
log_size
This attributes indicates the size of the memory region in bytes indicated by log_buf
which can safely be written to by the kernel.
log_buf
This attributes can be set to a pointer to a memory region allocated/reserved by the loader process where the verifier log will be written to. The detail of the log is set by log_level
. The verifier log is often the only indication in addition to the error code of why the syscall command failed to load the program.
The log is also written to on success. If the kernel runs out of space in the buffer while loading, on older kernels the loading process will fail and the command will return with an error code of -ENOSPC
. Callers had to take care to allocate enough buffer room, or load the program repeatedly and increasing the buffer when -ENOSPC
was returned.
Since v6.4 the kernel rotates the verifier log by default. The bottom of the log is always included, which is most of the time the most interesting part. Old behavior can be requested by setting the BPF_LOG_FIXED
flag in log_level
.
kern_version
Warning
This attribute is no longer used as of v5.0. The field is still present for compatibly reasons but does not do anything.
Before its deprecation, this field was supposed to point to a string containing the current kernel version. This string was checked against the actual kernel version for programs of type BPF_PROG_TYPE_KPROBE
.The rational behind the field was that kprobe are fundamentally unstable and thus had to be recompiled for every kernel version (this was before CO-RE was introduced), having to set this field would make this apparent to users.
The field was retired due to the invention of CO-RE and the tendency of users/libraries to automate setting this field anyway based on uname
without actually re-compiling.
prog_flags
This attribute specifies flags for all sorts of purposes, please see the Flags
section for details.
prog_name
This attribute specifies the name of the program. It is a 16 byte array which should be filled with a null-terminated
string thus leaving 15 characters for the name which must be one of (A-Z, a-z, 0-9, -
, _
).
This name is reported back to the user in the output of BPF_OBJ_GET
. No uniqueness guarantees are made by the kernel, so it is only useful for human feedback.
prog_ifindex
This attribute specifies the network interface index the user intends to attach this program to after loading. If the user intends to offload a given program to a network device, they must set this field so the drivers of that network device can validate the program in addition to the kernel verifier to gauge if the selected network device can offload the given program.
expected_attach_type
This attribute specifies the attach type the user expects to use when attaching the program. For certain program types, the attach type may changes aspects like the context type that will be given, the meaning of return values, and which helper function are or are not available. Therefor the verifier must know the attach type during loading time to enforce correct behavior of the program to be loaded.
The expected attach type is known to be important in the following cases:
- For
BPF_PROG_TYPE_LSM
programs only programs attached with typeBPF_LSM_CGROUP
are allowed to use certain helper functions. - For
BPF_PROG_TYPE_TRACING
programs the attach type determine access to helper calls - For
BPF_PROG_TYPE_CGROUP_SOCK_ADDR
programs the verifier restricts valid return values depending on attach type - For
BPF_PROG_TYPE_CGROUP_SKB
programs the verifier restricts valid return values depending on attach type - For
BPF_PROG_TYPE_CGROUP_SOCKOPT
programs the attach type determines accessability for certain context fields and helper functions. - Only
BPF_PROG_TYPE_XDP
programs withBPF_XDP_CPUMAP
attach type can be added to the values ofBPF_MAP_TYPE_CPUMAP
maps - Only
BPF_PROG_TYPE_XDP
programs withBPF_XDP_DEVMAP
attach type can be added to the values ofBPF_MAP_TYPE_DEVMAP
maps
Note
For BPF_PROG_TYPE_STRUCT_OPS
program types the expected_attach_type
doesn't contain a constant or enum value but rater the member index of the BTF struct specified by attach_btf_id
which is to be replaced by this eBPF program
prog_btf_fd
This attribute specifies the file descriptor of the BTF object which contains type information associated with the program we are loaded.
Loading BTF for your program is optional, but highly recommended since a ever growing number of features require BTF to properly function.
func_info_rec_size
This attribute specifies the size of the records in func_info
, this allows for compatibility between newer and older loaders and kernel versions if the size of the function info records ever changes.
func_info
This attribute specifies the memory region where extended BTF function info is located. This should be a pointer to an array of function info records with the size of func_info_rec_size
. The array should contain func_info_cnt
of these records.
This function info contains the signatures of functions within the program and is used to validate these signatures match expected signatures when used as callbacks for certain helper functions like bpf_loop
andbpf_timer_set_callback
.
func_info_cnt
This attribute specifies the amount of function records that are present in func_info
.
line_info_rec_size
This attribute specifies the size of the records in line_info
, this allows for compatibility between newer and older loaders and kernel versions if the size of the line info records ever changes.
line_info
This attribute specifies the memory region where extended BTF line info is located. This should be a pointer to an array of line info records with the size of line_info_rec_size
. The array should contain line_info_cnt
of these records.
This line information associates information like the filename+path, line number, column number and an snippet of source code which produced a given piece of the eBPF code. This information is available in the verifier log to make understanding the output easier as well as in output of BPF_OBJ_GET
.
Note
The verifier also enforces that for every function info record, there also exists a line info record on the same instruction.
line_info_cnt
This attribute specifies the amount of function records that are present in line_info
.
attach_btf_id
This attribute specifies the BTF type ID of kernel types the current program wishes to attach to. This ID refers the ID within the vmlinux
object, not the BTF object specified by prog_btf_fd
. This attribute can have different meaning depending on the program type.
- For
BPF_PROG_TYPE_STRUCT_OPS
this attribute is the ID of the ops struct of which the user wants to replace a function pointer with an eBPF program. - For
BPF_PROG_TYPE_LSM
this attribute specifies the LSM hook point where we intend to attach it to.
attach_prog_fd
This attribute specifies the file descriptor of an already loaded eBPF program. It is used in BPF_PROG_TYPE_EXT
program types to select which existing program should be extended.
attach_btf_obj_fd
This attribute specifies the file descriptor of a BTF object which the kernel should use instead of its internal vmlinux object. This is mainly used to hook BTF-dependant program types such as raw tracepoints, fentry/fexit, and LSM into kernel modules.
core_relo_cnt
This attribute specifies the size of the records in core_relos
, this allows for compatibility between newer and older loaders and kernel versions if the size of the CO-RE relocation records ever changes.
fd_array
This attribute specifies an array of file descriptors. This value should be a pointer to an array of 32 bit values containing file descriptors. When using this feature, loaders don't have to rewrite the eBPF program so the blob in the ELF can be signed. The instructions will instead contain index into this array and the actual file descriptors which may be different between program runs are thus not included in any signable blob.
If fd_array_cnt
is 0
, or in kernel versions without the attribute. The array can be sparse (no every element had to contain a valid file descriptor, just the indices referenced by instructions) and only map file descriptors are allowed.
If fd_array_cnt
is set, the array must be contiguous of fd_array_cnt
elements and all file descriptors must be valid. The array may contain file descriptors of maps and BTF objects. The program will take a refcount on all file descriptors in the array, not just the ones referenced by instructions.
core_relos
This attribute specifies the memory region where CO-RE relocations is located. This should be a pointer to an array of CO-Re relocation records with the size of core_relo_rec_size
. The array should contain core_relo_cnt
of these records.
Before the addition of this field, CO-RE relocations had to be performed by the loader in userspace. This process modifies the eBPF program passed in via insns
which makes signing of the program difficult. This field passes the CO-RE relocations to the kernel so that these can happen after verifying a potential program signature.
core_relo_rec_size
This attribute specifies the amount of function records that are present in core_relos
.
log_true_size
Since v6.4, the verifier log is not truncated to log_size
anymore, instead it is rotated such that the bottom of the log is always included, which is most of the time the most interesting part.
This field is an output, it will be set to the actual size of the log. If a user wants to get the full verifier log, this allows them to allocate a buffer of the correct size before calling BPF_PROG_LOAD
again with the same inputs (other than the resized buffer).
prog_token_fd
The file descriptor of a BPF token can be passed to this attribute. If the BPF token grants permission to create a program of the type specified in prog_type
, the kernel will allow the program to be loaded for a user without CAP_BPF
.
fd_array_cnt
If set, the fd_array
must be contiguous of fd_array_cnt
elements and all file descriptors must be valid. The array may contain file descriptors of maps and BTF objects. The program will take a refcount on all file descriptors in the array, not just the ones referenced by instructions.
Flags
BPF_F_STRICT_ALIGNMENT
If BPF_F_STRICT_ALIGNMENT
is used in BPF_PROG_LOAD
command, the verifier will perform strict alignment checking as if the kernel has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS
not set, and NET_IP_ALIGN
defined to 2.
BPF_F_ANY_ALIGNMENT
If BPF_F_ANY_ALIGNMENT
is used in BPF_PROF_LOAD
command, the verifier will allow any alignment whatsoever. On platforms with strict alignment requirements for loads ands stores (such as sparc and mips) the verifier validates that all loads and stores provably follow this requirement. This flag turns that checking and enforcement off.
It is mostly used for testing when we want to validate the context and memory access aspects of the verifier, but because of an unaligned access the alignment check would trigger before the one we are interested in.
BPF_F_TEST_RND_HI32
Warning
BPF_F_TEST_RND_HI32
is used for testing purpose, not meant for production usage.
Verifier does sub-register def/use analysis and identifies instructions whose def only matters for low 32-bit, high 32-bit is never referenced later through implicit zero extension. Therefore verifier notifies JIT back-ends that it is safe to ignore clearing high 32-bit for these instructions. This saves some back-ends a lot of code-gen. However such optimization is not necessary on some arches, for example x86_64, arm64 etc, whose JIT back-ends hence hasn't used verifier's analysis result. But, we really want to have a way to be able to verify the correctness of the described optimization on x86_64 on which test suites are frequently exercised.
So, this flag is introduced. Once it is set, verifier will randomize high 32-bit for those instructions who has been identified as safe to ignore them. Then, if verifier is not doing correct analysis, such randomization will regress tests to expose bugs.
BPF_F_TEST_STATE_FREQ
The verifier internal test flag used for stress testing state pruning.
Warning
Behavior is undefined
BPF_F_SLEEPABLE
If BPF_F_SLEEPABLE
is used in BPF_PROG_LOAD
command, the verifier will restrict map and helper usage for such programs. Sleepable BPF programs can only be attached to hooks where kernel execution context allows sleeping. Such programs are allowed to use helpers that may sleep like bpf_copy_from_user
.
BPF_F_XDP_HAS_FRAGS
This flag notifies the kernel that the XDP program supports XDP fragments. If set, the XDP program may be called with a context that doesn't include the full packet in a single linear piece of memory, which breaks assumptions most XDP programs have, hence the flag.
For more details, check out the XDP program type page
BPF_F_XDP_DEV_BOUND_ONLY
If BPF_F_XDP_DEV_BOUND_ONLY
is used in BPF_PROG_LOAD
command, the loaded program becomes device-bound but can access XDP metadata.
BPF_F_TEST_REG_INVARIANTS
The verifier internal test flag. Behavior is undefined
BPF_F_TOKEN_FD
When set, the kernel will use the BPF token in prog_token_fd
to authorize the creation of the map instead of checking the capabilities of the current user.