BPF Syscall BPF_MAP_CREATE
command
The BPF_MAP_CREATE
command is used to create a new BPF map.
Return value
This command will return a file descriptor to the created map on success (positive integer) or an error number (negative integer) if something went wrong.
Attributes
map_type
This attribute specifies which type of map should be created, this should be one of the pre-defined map types.
key_size
This attribute specifies the size of the key in bytes.
Info
Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.
value_size
This attribute specifies the size of the value in bytes.
Info
Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.
max_entries
This attribute specifies the maximum amount of entries the map can hold.
Info
Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.
map_flags
This attribute is a bitmask of flags, see the flags section below for details.
inner_map_fd
This attribute should be set to the FD of another map when creating map-in-map type maps. Doing so doesn't link the specified inner map to this new map we are creating, rather it is used as a mechanism to inform the kernel of the inner-maps attributes like type, key size, value size. When writing map references as values to this map, the kernel will verify that those maps are compatible with the attributes of the map given via this field.
A known technique is to create a pseudo/temporary map just for the purpose of informing this field and then releasing all references to it.
numa_node
This attribute specifies on which NUMA node the map should be located. Memory access within the same node is typically faster, which can lead to optimization if applied correctly.
map_name
This attribute allows the map creator to give it a human readable name. The attribute is an array of 16 bytes in which a null terminated string can be placed (thus limiting the name to 15 actual characters). This name will stay associated with the map and is reported back in the results of BPF_OBJ_GET_INFO_BY_*
syscall commands.
map_ifindex
This attribute can be set to the index of a network interface to request that the map be offloaded to that network device. This does require that network interface to support eBPF offloading.
btf_fd
This attribute specifies the file descriptor of the BTF object which contains the key and value type info which will be referenced in btf_key_type_id
and btf_key_value_id
.
Adding BTF information about the key and value types of the map allows tools like bpftool
to pretty-print the map keys and values instead of just the binary blobs.
btf_key_type_id
This attribute specifies the BTF type ID of the map key within the BTF object indicated by btf_id
.
btf_value_type_id
This attribute specifies the BTF type ID of the map value within the BTF object indicated by btf_id
.
btf_vmlinux_value_type_id
This attribute is specifically used for the BPF_MAP_TYPE_STRUCT_OPS
map type to indicate which structure in the kernel we wish to replicate using eBPF. For more details please check the struct ops map page.
map_extra
This attribute specifies additional settings, the meaning of which is map type specific.
It has the following meanings per map type:
BPF_MAP_TYPE_BLOOM_FILTER
- The lowest 4 bits indicate the number of hash functions (if 0, the bloom filter will default to using 5 hash functions).
value_type_btf_obj_fd
This attribute is used when creating a BPF_MAP_TYPE_STRUCT_OPS
map which attaches to a ops struct defined in a kernel module. The BPF_F_VTYPE_BTF_OBJ_FD
flag should be set to indicate to the kernel that the attribute is used.
map_token_fd
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
Flags
BPF_F_NO_PREALLOC
Before kernel version v4.6, BPF_MAP_TYPE_HASH
and BPF_MAP_TYPE_PERCPU_HASH
hash maps were lazily allocated. To improve performance, the default has been switched to pre-allocation of such map types. However, this means that for large max_entries
values a lot of unused memory is kept in reserve. Setting this flag will not pre-allocate these maps.
Some map types require the loader to set this flag when creating maps to explicitly make clear that memory for such map types is always lazily allocated (also to guarantee stable behavior in case pre-allocation for those maps is ever added).
BPF_F_NO_COMMON_LRU
By default, LRU maps have a single LRU list (even per-CPU LRU maps). When set, the an LRU map will use a per-CPU LRU list which can scale and perform better.
Note
The LRU nodes (including free nodes) cannot be moved across different LRU lists.
BPF_F_NUMA_NODE
When set, the numa_node
attribute is respected during map creation.
BPF_F_RDONLY
Setting this flag will make it so the map can only be read via the syscall interface, but not written to.
This flag is mutually exclusive with BPF_F_WRONLY
, one of them can be used, not both.
BPF_F_WRONLY
Setting this flag will make it so the map can only be written to via the syscall interface, but not read from.
This flag is mutually exclusive with BPF_F_RDONLY
, one of them can be used, not both.
BPF_F_STACK_BUILD_ID
By default, BPF_MAP_TYPE_STACK_TRACE
maps store address for each entry in the call trace. To map these addresses to user space files, it is necessary to maintain the mapping from these virtual address to symbols in the binary.
When setting this flag, the stack trace map will instead store the variation stores ELF file build_id + offset.
For more details, check the stack trace map map page.
BPF_F_ZERO_SEED
This flag can be used in the following map types:
BPF_MAP_TYPE_HASH
BPF_MAP_TYPE_PERCPU_HASH
BPF_MAP_TYPE_LRU_HASH
BPF_MAP_TYPE_LRU_PERCPU_HASH
BPF_MAP_TYPE_BLOOM_FILTER
BPF_F_RDONLY_PROG
Setting this flag will make it so the map can only be read via helper functions, but not written to.
This flag is mutually exclusive with BPF_F_WRONLY_PROG
, one of them can be used, not both.
BPF_F_WRONLY_PROG
Setting this flag will make it so the map can only be written to via helper functions, but not read from.
This flag is mutually exclusive with BPF_F_RDONLY_PROG
, one of them can be used, not both.
BPF_F_CLONE
This flag specifically applies to BPF_MAP_TYPE_SK_STORAGE
maps. Sockets can be cloned. Setting this flag on the socket storage allows it to be cloned along with the socket itself when this happens. By default the storage is not cloned and the socket storage on the cloned socket will stay empty.
BPF_F_MMAPABLE
Setting this flag on a BPF_MAP_TYPE_ARRAY
will allow userspace programs to mmap the array values into the userspace process, effectively making a shared memory region between eBPF programs and a userspace program.
This can significantly improve read and write performance since there is no syscall overhead to access the map.
Using this flag is only supported on BPF_MAP_TYPE_ARRAY
maps, for more details check the array map page.
BPF_F_PRESERVE_ELEMS
Maps of type BPF_MAP_TYPE_PERF_EVENT_ARRAY
by default will clear all unread perf events when the original map file descriptor is closed, even if the map still exists. Setting this flag will make it so any pending elements will stay until explicitly removed or the map is freed. This makes sharing the perf event array between userspace programs easier.
BPF_F_INNER_MAP
Map-in-Map maps normally require that all inner maps have the same max_entries
value and that this value matches the max_entries
of the map specified by inner_map_fd
. Setting this flag on the inner map value when loading will allow you to assign that map to the outer map even if it has a different max_entries
value. This is at the cost of a slight hit to performance during lookups.
BPF_F_LINK
The BPF_F_LINK
flag is used to indicate that a link is to be used to attach a BPF_TYPE_STRUCT_OPS
map to a callsite. If this flag is absent, the map is attached when the value of the map is set using BPF_MAP_UPDATE_ELEM
and can be detached using BPF_MAP_DELETE_ELEM
.
Note
This flag has a different meaning when used in the BPF_LINK_CREATE
command.
BPF_F_PATH_FD
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
BPF_F_VTYPE_BTF_OBJ_FD
The BPF_F_VTYPE_BTF_OBJ_FD
flag is used to indicate that the btf_vmlinux_value_type_id
attribute has been set.
BPF_F_TOKEN_FD
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
BPF_F_SEGV_ON_FAULT
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
BPF_F_NO_USER_CONV
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
Example
union bpf_attr my_map {
.map_type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 100,
.map_flags = BPF_F_NO_PREALLOC,
};
int fd = bpf(BPF_MAP_CREATE, &my_map, sizeof(my_map));