Map type BPF_MAP_TYPE_PERCPU_HASH
This is the per-CPU variant of the BPF_MAP_TYPE_HASH
map type.
This map type is a generic map type with no restrictions on the structure of the key and value. Hash-maps are implemented using a hash table, allowing for lookups with arbitrary keys.
This per-CPU version has a separate hash map for each logical CPU. When accessing the map using most helper function, the hash map assigned to the CPU the eBPF program is currently running on is accessed implicitly.
Since preemption is disabled during program execution, no other programs will be able to concurrently access the same memory. This guarantees there will never be any race conditions and improves the performance due to the lack of congestion and synchronization logic, at the cost of having a large memory footprint.
Note
The bpf_map_lookup_percpu_elem
helper can be used to access maps assigned to other logical CPUs which can negate the above mentioned advantages.
Attributes
While the size of the key and value are essentially unrestricted both value_size
and key_size
must be at least zero and their combined size no larger than KMALLOC_MAX_SIZE
. KMALLOC_MAX_SIZE
is the maximum size which can be allocated by the kernel memory allocator, its exact value being dependant on a number of factors. If this edge case is hit a -E2BIG
error number is returned to the map create syscall.
The max_entries
attribute indicates the max entries per-CPU so the actual memory size consumed is also dependant on the logical CPU count of the host.
Syscall commands
The following syscall commands work with this map type:
BPF_MAP_LOOKUP_ELEM
BPF_MAP_LOOKUP_AND_DELETE_ELEM
BPF_MAP_UPDATE_ELEM
BPF_MAP_GET_NEXT_KEY
BPF_MAP_LOOKUP_BATCH
BPF_MAP_LOOKUP_AND_DELETE_BATCH
Helper functions
The following helper functions work with this map type:
bpf_for_each_map_elem
bpf_map_delete_elem
bpf_map_lookup_elem
bpf_map_lookup_percpu_elem
bpf_map_update_elem
Flags
The following flags are supported by this map type.
BPF_F_NO_PREALLOC
Hash maps are pre-allocated by default, this means that even a completely empty hash map will use the same amount of kernel memory as a full map.
If this flag is set, pre-allocation is disabled. Users might consider this for large maps since allocating large amounts of memory takes a lot of time during creation and might be undesirable.
Warning
The patch set1 does note that not pre-allocating may cause issues in some edge-cases, which was the original reason for defaulting to pre-allocation.
BPF_F_NUMA_NODE
While settings this flag is allowed, only a value of -1
is allowed in the numa_node
attribute, which indicates no specific NUMA node. Since each logical CPU has its own hash table, it is impossible to allocate on only a single NUMA node.
BPF_F_RDONLY
Setting this flag will make it so the map can only be read via the syscall interface, but not written to.
For details please check the generic description.
BPF_F_WRONLY
Setting this flag will make it so the map can only be written to via the syscall interface, but not read from.
For details please check the generic description.
BPF_F_ZERO_SEED
Setting this flag will initialize the hash table with a seed of 0.
The hashing algorithm used by the hash table is seeded with a random number by default. This seeding is meant as a mitigation against Denial of Service attacks which could exploit the predictability of hashing implementations.
This random seed makes hash map operations inherently random in access time. This flag was introduced to make performance evaluation more consistent.
Warning
It is not recommended to use this flag in production due to the vulnerability to Denial of Service attacks.
Info
Only users with the CAP_SYS_ADMIN
capability can use this flag, CAP_BPF
is not enough due to the security risk associated with the flag.
BPF_F_RDONLY_PROG
Setting this flag will make it so the map can only be read via helper functions, but not written to.
For details please check the generic description.
BPF_F_WRONLY_PROG
Setting this flag will make it so the map can only be written to via helper functions, but not read from.
For details please check the generic description.