Skip to content

BPF Syscall BPF_MAP_CREATE command

v3.18

The BPF_MAP_CREATE command is used to create a new BPF map.

Return value

This command will return a file descriptor to the created map on success (positive integer) or an error number (negative integer) if something went wrong.

Attributes

union bpf_attr {
    struct {
        __u32   map_type;
        __u32   key_size;
        __u32   value_size;
        __u32   max_entries;
        __u32   map_flags;
        __u32   inner_map_fd;
        __u32   numa_node;
        char    map_name[BPF_OBJ_NAME_LEN];
        __u32   map_ifindex;
        __u32   btf_fd;
        __u32   btf_key_type_id;
        __u32   btf_value_type_id;
        __u32   btf_vmlinux_value_type_id;
        __u64   map_extra;
        __s32   value_type_btf_obj_fd;    
        __s32   map_token_fd;
    };
};

map_type

v3.18

This attribute specifies which type of map should be created, this should be one of the pre-defined map types.

key_size

v3.18

This attribute specifies the size of the key in bytes.

Info

Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.

value_size

v3.18

This attribute specifies the size of the value in bytes.

Info

Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.

max_entries

v3.18

This attribute specifies the maximum amount of entries the map can hold.

Info

Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.

map_flags

v4.6

This attribute is a bitmask of flags, see the flags section below for details.

inner_map_fd

v4.12

This attribute should be set to the FD of another map when creating map-in-map type maps. Doing so doesn't link the specified inner map to this new map we are creating, rather it is used as a mechanism to inform the kernel of the inner-maps attributes like type, key size, value size. When writing map references as values to this map, the kernel will verify that those maps are compatible with the attributes of the map given via this field.

A known technique is to create a pseudo/temporary map just for the purpose of informing this field and then releasing all references to it.

numa_node

v4.14

This attribute specifies on which NUMA node the map should be located. Memory access within the same node is typically faster, which can lead to optimization if applied correctly.

map_name

v4.15

This attribute allows the map creator to give it a human readable name. The attribute is an array of 16 bytes in which a null terminated string can be placed (thus limiting the name to 15 actual characters). This name will stay associated with the map and is reported back in the results of BPF_OBJ_GET_INFO_BY_* syscall commands.

map_ifindex

v4.16

This attribute can be set to the index of a network interface to request that the map be offloaded to that network device. This does require that network interface to support eBPF offloading.

btf_fd

v4.18

This attribute specifies the file descriptor of the BTF object which contains the key and value type info which will be referenced in btf_key_type_id and btf_key_value_id.

Adding BTF information about the key and value types of the map allows tools like bpftool to pretty-print the map keys and values instead of just the binary blobs.

btf_key_type_id

v4.18

This attribute specifies the BTF type ID of the map key within the BTF object indicated by btf_id.

btf_value_type_id

v4.18

This attribute specifies the BTF type ID of the map value within the BTF object indicated by btf_id.

btf_vmlinux_value_type_id

v5.6

This attribute is specifically used for the BPF_MAP_TYPE_STRUCT_OPS map type to indicate which structure in the kernel we wish to replicate using eBPF. For more details please check the struct ops map page.

map_extra

v5.16

This attribute specifies additional settings, the meaning of which is map type specific.

It has the following meanings per map type:

  • BPF_MAP_TYPE_BLOOM_FILTER - The lowest 4 bits indicate the number of hash functions (if 0, the bloom filter will default to using 5 hash functions).

value_type_btf_obj_fd

v6.9

This attribute is used when creating a BPF_MAP_TYPE_STRUCT_OPS map which attaches to a ops struct defined in a kernel module. The BPF_F_VTYPE_BTF_OBJ_FD flag should be set to indicate to the kernel that the attribute is used.

map_token_fd

The file descriptor of a BPF token can be passed to this attribute. If the BPF token grants permission to create a map of the type specified in map_type, the kernel will allow the map to be created for a user without CAP_BPF.

Flags

BPF_F_NO_PREALLOC

v4.6

Before kernel version v4.6, BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_PERCPU_HASH hash maps were lazily allocated. To improve performance, the default has been switched to pre-allocation of such map types. However, this means that for large max_entries values a lot of unused memory is kept in reserve. Setting this flag will not pre-allocate these maps.

Some map types require the loader to set this flag when creating maps to explicitly make clear that memory for such map types is always lazily allocated (also to guarantee stable behavior in case pre-allocation for those maps is ever added).

BPF_F_NO_COMMON_LRU

v4.10

By default, LRU maps have a single LRU (even per-CPU LRU maps). When set, a LRU map will use a per-CPU LRU which has better performance in certain cases, but also has implications. See BPF_MAP_TYPE_LRU_HASH for more details.

Note

The LRU nodes (including free nodes) cannot be moved across different LRU lists.

BPF_F_NUMA_NODE

v4.14

When set, the numa_node attribute is respected during map creation.

BPF_F_RDONLY

v4.15

Setting this flag will make it so the map can only be read via the syscall interface, but not written to.

This flag is mutually exclusive with BPF_F_WRONLY, one of them can be used, not both.

BPF_F_WRONLY

v4.15

Setting this flag will make it so the map can only be written to via the syscall interface, but not read from.

This flag is mutually exclusive with BPF_F_RDONLY, one of them can be used, not both.

BPF_F_STACK_BUILD_ID

v4.17

By default, BPF_MAP_TYPE_STACK_TRACE maps store address for each entry in the call trace. To map these addresses to user space files, it is necessary to maintain the mapping from these virtual address to symbols in the binary.

When setting this flag, the stack trace map will instead store the variation stores ELF file build_id + offset.

For more details, check the stack trace map map page.

BPF_F_ZERO_SEED

v5.0

This flag can be used in the following map types:

BPF_F_RDONLY_PROG

v5.2

Setting this flag will make it so the map can only be read via helper functions, but not written to.

This flag is mutually exclusive with BPF_F_WRONLY_PROG, one of them can be used, not both.

BPF_F_WRONLY_PROG

v5.2

Setting this flag will make it so the map can only be written to via helper functions, but not read from.

This flag is mutually exclusive with BPF_F_RDONLY_PROG, one of them can be used, not both.

BPF_F_CLONE

v5.4

This flag specifically applies to BPF_MAP_TYPE_SK_STORAGE maps. Sockets can be cloned. Setting this flag on the socket storage allows it to be cloned along with the socket itself when this happens. By default the storage is not cloned and the socket storage on the cloned socket will stay empty.

BPF_F_MMAPABLE

v5.5

Setting this flag on a BPF_MAP_TYPE_ARRAY will allow userspace programs to mmap the array values into the userspace process, effectively making a shared memory region between eBPF programs and a userspace program.

This can significantly improve read and write performance since there is no syscall overhead to access the map.

Using this flag is only supported on BPF_MAP_TYPE_ARRAY maps, for more details check the array map page.

BPF_F_PRESERVE_ELEMS

v5.10

Maps of type BPF_MAP_TYPE_PERF_EVENT_ARRAY by default will clear all unread perf events when the original map file descriptor is closed, even if the map still exists. Setting this flag will make it so any pending elements will stay until explicitly removed or the map is freed. This makes sharing the perf event array between userspace programs easier.

BPF_F_INNER_MAP

v5.10

Map-in-Map maps normally require that all inner maps have the same max_entries value and that this value matches the max_entries of the map specified by inner_map_fd. Setting this flag on the inner map value when loading will allow you to assign that map to the outer map even if it has a different max_entries value. This is at the cost of a slight hit to performance during lookups.

v6.4

The BPF_F_LINK flag is used to indicate that a link is to be used to attach a BPF_TYPE_STRUCT_OPS map to a callsite. If this flag is absent, the map is attached when the value of the map is set using BPF_MAP_UPDATE_ELEM and can be detached using BPF_MAP_DELETE_ELEM.

Note

This flag has a different meaning when used in the BPF_LINK_CREATE command.

BPF_F_VTYPE_BTF_OBJ_FD

v6.9

The BPF_F_VTYPE_BTF_OBJ_FD flag is used to indicate that the btf_vmlinux_value_type_id attribute has been set.

BPF_F_TOKEN_FD

v6.9

When set, the kernel will use the BPF token in map_token_fd to authorize the creation of the map instead of checking the capabilities of the current user.

BPF_F_SEGV_ON_FAULT

v6.9

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

BPF_F_NO_USER_CONV

v6.9

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

Example

union bpf_attr my_map {
    .map_type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(int),
    .value_size = sizeof(int),
    .max_entries = 100,
    .map_flags = BPF_F_NO_PREALLOC,
};
int fd = bpf(BPF_MAP_CREATE, &my_map, sizeof(my_map));