Map type BPF_MAP_TYPE_LPM_TRIE
The LPM (Largest Prefix Match) map is a generic map type which does prefix matching on the key upon lookup.
Usage
One of the main use cases for this map type is to implement routing tables or policies for IP ranges. Take the following key-value pairs:
10.0.0.0/8
-> 110.0.10.0/24
-> 210.0.10.123/32
-> 3
A lookup for 10.0.10.123
will return value 3 because we have a specific entry for it in the map. A lookup for 10.0.10.200
will return value 2, because the /24
key is more specific than the /8
key. A lookup for 10.12.0.1
would return 1. And a lookup for 12.0.0.1
will not return any entry.
Attributes
The value_size
is unrestricted. The key_size
is variable, but must adhere to a specific structure. The first 4 bytes of the key indicate the prefix length of the key (the number after the /
in a CIDR), the rest of the bytes are the value.
The bpf.h
header file provides the following struct as example:
struct bpf_lpm_trie_key {
__u32 prefixlen; /* up to 32 for AF_INET, 128 for AF_INET6 */
__u8 data[0]; /* Arbitrary size */
};
A common trick is to embed this struct in a custom key type:
struct lpm_key {
struct bpf_lpm_trie_key trie_key;
__u32 data;
};
But this is not required, as long as the first field is a __u32
:
struct lpm_key {
__u32 prefixlen;
__be32 addr;
};
LPM tries may be created with a maximum prefix length that is a multiple of 8, in the range from 8 to 2048. So the data
part of the key can't be larger than 256 bytes (4 or 16 is normal for IPv4 and IPv6 addresses).
Data stored in @data of struct
bpf_lpm_key
and structlpm_trie_node
is interpreted as big endian, so data[0] stores the most significant byte. [...] one single element that matches 192.168.0.0/16. The data array would hence contain[0xc0, 0xa8, 0x00, 0x00]
in big-endian notation.
Note
The BPF_F_NO_PREALLOC
flag must always be set when creating this map type since the implementation cannot pre-allocate the map.
Syscall commands
The following syscall commands work with this map type:
Helper functions
The following helper functions work with this map type:
Flags
The following flags are supported by this map type.
BPF_F_NO_PREALLOC
Hash maps are pre-allocated by default, this means that even a completely empty hash map will use the same amount of kernel memory as a full map.
If this flag is set, pre-allocation is disabled. Users might consider this for large maps since allocating large amounts of memory takes a lot of time during creation and might be undesirable.
Note
Setting this flag during map creation is mandatory for the LPM map type.
BPF_F_NUMA_NODE
When set, the numa_node
attribute is respected during map creation.
BPF_F_RDONLY
Setting this flag will make it so the map can only be read via the syscall interface, but not written to.
For details please check the generic description.
BPF_F_WRONLY
Setting this flag will make it so the map can only be written to via the syscall interface, but not read from.
For details please check the generic description.
BPF_F_RDONLY_PROG
Setting this flag will make it so the map can only be read via helper functions, but not written to.
For details please check the generic description.
BPF_F_WRONLY_PROG
Setting this flag will make it so the map can only be written to via helper functions, but not read from.
For details please check the generic description.
Example
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct ipv4_lpm_key {
__u32 prefixlen;
__u32 data;
};
struct {
__uint(type, BPF_MAP_TYPE_LPM_TRIE);
__type(key, struct ipv4_lpm_key);
__type(value, __u32);
__uint(map_flags, BPF_F_NO_PREALLOC);
__uint(max_entries, 255);
} ipv4_lpm_map SEC(".maps");
void *lookup(__u32 ipaddr)
{
struct ipv4_lpm_key key = {
.prefixlen = 32,
.data = ipaddr
};
return bpf_map_lookup_elem(&ipv4_lpm_map, &key);
}