Program type BPF_PROG_TYPE_SK_LOOKUP
The socket lookup program allows an eBPF program to pick which socket to send traffic to irrespective of how that target socket has been bound.
The primary use case for this program type is to allow a single program to handle traffic for network patterns which cannot be expressed with the normal bind syscall. For example, a single socket can be bound to a whole /24
network CIDR (bind only allows for single IPs, or you have to set it to 0.0.0.0
which is not desirable if another application should answer a different range of IPs). Or a single socket can listen to any port for a given IP.
Usage
Socket lookup programs are typically put into an ELF section prefixed with sk_lookup
. Socket lookup programs are invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols.
The socket lookup program acts as a filter, if it returns SK_DROP
(0
) the connection or packet is dropped. If it returns SK_PASS
(1
) without setting a socket, the normal resolve behavior is used. However, the program can also chose to assign a specific socket with the bpf_sk_assign
helper function.
Context
Socket lookup programs are called with the struct bpf_sk_lookup
context.
c structure
union {
__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
__u64 cookie; /* Non-zero if socket was selected in PROG_TEST_RUN */
};
__u32 family; /* Protocol family (AF_INET, AF_INET6) */
__u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
__u32 remote_ip4; /* Network byte order */
__u32 remote_ip6[4]; /* Network byte order */
__be16 remote_port; /* Network byte order */
__u16 :16; /* Zero padding */
__u32 local_ip4; /* Network byte order */
__u32 local_ip6[4]; /* Network byte order */
__u32 local_port; /* Host byte order */
__u32 ingress_ifindex; /* The arriving interface. Determined by inet_iif. */
sk
This field is a pointer to a selected socket, the field is read-only, but can be updated via the bpf_sk_assign
helper function.
cookie
This field is is set to the cookie of the assigned socket if the program assigns one during a PROG_TEST_RUN
.
family
The address family of the connection/packet for which the program is invoked. Can be AF_INET
or AF_INET6
protocol
The transport layer protocol of the connection/packet for which the program is invoked. Can be IPPROTO_TCP
or IPPROTO_UDP
remote_ip4
The remote IPv4 address of the connection/packet for which the program is invoked.
remote_ip6
The remote IPv6 address of the connection/packet for which the program is invoked.
remote_port
The remote port of the connection/packet for which the program is invoked.
local_ip4
The local IPv4 address of the connection/packet for which the program is invoked.
local_ip6
The local IPv6 address of the connection/packet for which the program is invoked.
local_port
The local port of the connection/packet for which the program is invoked.
ingress_ifindex
The network interface index of the network interface on which the packet arrived.
Attachment
This program type must always be loaded with the expected_attach_type
of BPF_SK_LOOKUP
.
Socket lookup programs are attached to a network namespace using a link. When creating the link the prog_fd
to the file descriptor of the program, target_fd
should be set to the file descriptor of a network namespace, and the attach_type
to BPF_SK_LOOKUP
.
Example
// Copyright (c) 2020 Cloudflare
struct {
__uint(type, BPF_MAP_TYPE_SOCKMAP);
__uint(max_entries, 32);
__type(key, __u32);
__type(value, __u64);
} redir_map SEC(".maps");
static const __u16 DST_PORT = 7007; /* Host byte order */
static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
static const __u32 KEY_SERVER_A = 0;
/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
SEC("sk_lookup")
int redir_ip4(struct bpf_sk_lookup *ctx)
{
struct bpf_sock *sk;
int err;
if (ctx->family != AF_INET)
return SK_PASS;
if (ctx->local_port != DST_PORT)
return SK_PASS;
if (ctx->local_ip4 != DST_IP4)
return SK_PASS;
sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
if (!sk)
return SK_PASS;
err = bpf_sk_assign(ctx, sk, 0);
bpf_sk_release(sk);
return err ? SK_DROP : SK_PASS;
}
Helper functions
Not all helper functions are available in all program types. These are the helper calls available for socket filter programs:
Supported helper functions
bpf_perf_event_output
bpf_sk_assign
bpf_sk_release
bpf_skc_to_tcp6_sock
bpf_skc_to_tcp_sock
bpf_skc_to_tcp_timewait_sock
bpf_skc_to_tcp_request_sock
bpf_skc_to_udp6_sock
bpf_skc_to_unix_sock
bpf_ktime_get_coarse_ns
bpf_map_lookup_elem
bpf_map_update_elem
bpf_map_delete_elem
bpf_map_push_elem
bpf_map_pop_elem
bpf_map_peek_elem
bpf_map_lookup_percpu_elem
bpf_get_prandom_u32
bpf_get_smp_processor_id
bpf_get_numa_node_id
bpf_tail_call
bpf_ktime_get_ns
bpf_ktime_get_boot_ns
bpf_ringbuf_output
bpf_ringbuf_reserve
bpf_ringbuf_submit
bpf_ringbuf_discard
bpf_ringbuf_query
bpf_for_each_map_elem
bpf_loop
bpf_strncmp
bpf_spin_lock
bpf_spin_unlock
bpf_jiffies64
bpf_per_cpu_ptr
bpf_this_cpu_ptr
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_timer_cancel
bpf_trace_printk
bpf_get_current_task
bpf_get_current_task_btf
bpf_probe_read_user
bpf_probe_read_kernel
bpf_probe_read_user_str
bpf_probe_read_kernel_str
bpf_snprintf_btf
bpf_snprintf
bpf_task_pt_regs
bpf_trace_vprintk
bpf_cgrp_storage_get
bpf_cgrp_storage_delete
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_kptr_xchg
bpf_ktime_get_tai_ns
bpf_ringbuf_discard_dynptr
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit_dynptr
bpf_user_ringbuf_drain
KFuncs
There are currently no kfuncs supported for this program type