Program type BPF_PROG_TYPE_CGROUP_SOCK_ADDR
cGroup socket address programs are triggered when a process in a cGroup to which the program is attached uses socket related syscalls. This program can overwrite arguments to the syscall such as addresses.
Usage
This program type can be used to overwrite arguments to socket related syscalls or to block the call to the syscall entirely. Which syscall depends on the attach type used.
BPF_CGROUP_INET4_BIND
and BPF_CGROUP_INET6_BIND
This attach type is triggered when a process calls the bind
syscall with an IPv4 or IPv6 address respectively. The typical ELF sections used for this attach type are: cgroup/bind4
and cgroup/bind6
.
Note
Since v5.12 the 2's bit of the return value is used to indicate that checking for the CAP_NET_BIND_SERVICE
capability can be skipped. Normally this capability is required when binding to a privileged port (<1024
). So when a BPF program rewrites the listening port on a process without the capability it can set this bit to prevent the kernel from blocking the call.
BPF_CGROUP_INET4_CONNECT
and BPF_CGROUP_INET6_CONNECT
This attach type is triggered when a process calls the connect
syscall with an IPv4 or IPv6 address respectively. The typical ELF sections used for this attach type are: cgroup/connect4
and cgroup/connect6
.
BPF_CGROUP_UDP4_SENDMSG
and BPF_CGROUP_UDP6_SENDMSG
This attach type is triggered when a process calls the sendmsg
syscall with an IPv4 or IPv6 address respectively. The typical ELF sections used for this attach type are: cgroup/sendmsg4
and cgroup/sendmsg6
.
BPF_CGROUP_UDP4_RECVMSG
and BPF_CGROUP_UDP6_RECVMSG
This attach type is triggered when a process calls the recvmsg
syscall with an IPv4 or IPv6 address respectively. The typical ELF sections used for this attach type are: cgroup/recvmsg4
and cgroup/recvmsg6
.
BPF_CGROUP_INET4_GETPEERNAME
and BPF_CGROUP_INET6_GETPEERNAME
This attach type is triggered when a process calls the getpeername
syscall with an IPv4 or IPv6 address respectively. The typical ELF sections used for this attach type are: cgroup/getpeername4
and cgroup/getpeername6
.
BPF_CGROUP_INET4_GETSOCKNAME
and BPF_CGROUP_INET6_GETSOCKNAME
This attach type is triggered when a process calls the getsockname
syscall with an IPv4 or IPv6 address respectively. The typical ELF sections used for this attach type are: cgroup/getsockname4
and cgroup/getsockname6
.
Context
C structure
struct bpf_sock_addr {
__u32 user_family; /* Allows 4-byte read, but no write. */
__u32 user_ip4; /* Allows 1,2,4-byte read and 4-byte write.
* Stored in network byte order.
*/
__u32 user_ip6[4]; /* Allows 1,2,4,8-byte read and 4,8-byte write.
* Stored in network byte order.
*/
__u32 user_port; /* Allows 1,2,4-byte read and 4-byte write.
* Stored in network byte order
*/
__u32 family; /* Allows 4-byte read, but no write */
__u32 type; /* Allows 4-byte read, but no write */
__u32 protocol; /* Allows 4-byte read, but no write */
__u32 msg_src_ip4; /* Allows 1,2,4-byte read and 4-byte write.
* Stored in network byte order.
*/
__u32 msg_src_ip6[4]; /* Allows 1,2,4,8-byte read and 4,8-byte write.
* Stored in network byte order.
*/
__bpf_md_ptr(struct bpf_sock *, sk);
};
user_family
This field contains the address family passed to the syscall. Its value is one of AF_*
values defined in include/linux/socket.h
.
The context allows 4-byte reads from the field, but no writes to it.
user_ip4
This field contains the IPv4 address passed to the syscall. Its value is stored in network byte order. This field is only valid of INET4
attach types.
The context allows 1,2,4-byte reads and 4-byte writes.
user_ip6
This field contains the IPv6 address passed to the syscall. Its value is stored in network byte order. This field is only valid of INET6
attach types.
This context allows 1,2,4,8-byte reads and 4,8-byte writes.
Note
8-byte wide loads are only supported since v5.3
user_port
This field contains the port number passed to the syscall. Its value is stored in network byte order.
This context allows 1,2,4-byte reads and 4-byte writes.
family
This field contains the address family of the socket. Its value is one of AF_*
values defined in include/linux/socket.h
.
The context allows 4-byte reads from the field, but no writes to it.
type
This field contains the socket type. Its value is one of SOCK_*
values defined in include/linux/socket.h
.
This context allows 4-byte reads from the field, but no writes to it.
protocol
This field contains the socket protocol. Its value is one of IPPROTO_*
values defined in include/linux/socket.h
.
This context allows 4-byte reads from the field, but no writes to it.
msg_src_ip4
This field contains a IPv4 address which is the source IP of the message about to be sent. Its value is stored in network byte order.
This field is only valid of BPF_CGROUP_UDP4_SENDMSG
attach type.
This context allows 1,2,4-byte reads and 4-byte writes.
msg_src_ip6
This field contains a IPv6 address which is the source IP of the message about to be sent. Its value is stored in network byte order.
This field is only valid of BPF_CGROUP_UDP6_SENDMSG
attach type.
This context allows 1,2,4,8-byte reads and 4,8-byte writes.
Note
8-byte wide loads are only supported since v5.3
sk
This field contains a pointer to the socket for which the program was invoked, its type being a struct bpf_sock
.
Attachment
cGroup socket buffer programs are attached to cGroups via the BPF_PROG_ATTACH
syscall or via BPF link.
Example
BPF_CGROUP_INET4_BIND
and BPF_CGROUP_INET6_BIND
// SPDX-License-Identifier: GPL-2.0
#include <linux/stddef.h>
#include <linux/bpf.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
static __always_inline int bind_prog(struct bpf_sock_addr *ctx, int family)
{
struct bpf_sock *sk;
sk = ctx->sk;
if (!sk)
return 0;
if (sk->family != family)
return 0;
if (ctx->type != SOCK_STREAM)
return 0;
/* Return 1 OR'ed with the first bit set to indicate
* that CAP_NET_BIND_SERVICE should be bypassed.
*/
if (ctx->user_port == bpf_htons(111))
return (1 | 2);
return 1;
}
SEC("cgroup/bind4")
int bind_v4_prog(struct bpf_sock_addr *ctx)
{
return bind_prog(ctx, AF_INET);
}
SEC("cgroup/bind6")
int bind_v6_prog(struct bpf_sock_addr *ctx)
{
return bind_prog(ctx, AF_INET6);
}
char _license[] SEC("license") = "GPL";
BPF_CGROUP_INET4_CONNECT
, BPF_CGROUP_INET4_GETSOCKNAME
, and BPF_CGROUP_INET4_GETPEERNAME
// SPDX-License-Identifier: GPL-2.0
#include <string.h>
#include <stdbool.h>
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
#include <sys/socket.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include <bpf_sockopt_helpers.h>
char _license[] SEC("license") = "GPL";
struct svc_addr {
__be32 addr;
__be16 port;
};
struct {
__uint(type, BPF_MAP_TYPE_SK_STORAGE);
__uint(map_flags, BPF_F_NO_PREALLOC);
__type(key, int);
__type(value, struct svc_addr);
} service_mapping SEC(".maps");
SEC("cgroup/connect4")
int connect4(struct bpf_sock_addr *ctx)
{
struct sockaddr_in sa = {};
struct svc_addr *orig;
/* Force local address to 127.0.0.1:22222. */
sa.sin_family = AF_INET;
sa.sin_port = bpf_htons(22222);
sa.sin_addr.s_addr = bpf_htonl(0x7f000001);
if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
return 0;
/* Rewire service 1.2.3.4:60000 to backend 127.0.0.1:60123. */
if (ctx->user_port == bpf_htons(60000)) {
orig = bpf_sk_storage_get(&service_mapping, ctx->sk, 0,
BPF_SK_STORAGE_GET_F_CREATE);
if (!orig)
return 0;
orig->addr = ctx->user_ip4;
orig->port = ctx->user_port;
ctx->user_ip4 = bpf_htonl(0x7f000001);
ctx->user_port = bpf_htons(60123);
}
return 1;
}
SEC("cgroup/getsockname4")
int getsockname4(struct bpf_sock_addr *ctx)
{
if (!get_set_sk_priority(ctx))
return 1;
/* Expose local server as 1.2.3.4:60000 to client. */
if (ctx->user_port == bpf_htons(60123)) {
ctx->user_ip4 = bpf_htonl(0x01020304);
ctx->user_port = bpf_htons(60000);
}
return 1;
}
SEC("cgroup/getpeername4")
int getpeername4(struct bpf_sock_addr *ctx)
{
struct svc_addr *orig;
if (!get_set_sk_priority(ctx))
return 1;
/* Expose service 1.2.3.4:60000 as peer instead of backend. */
if (ctx->user_port == bpf_htons(60123)) {
orig = bpf_sk_storage_get(&service_mapping, ctx->sk, 0, 0);
if (orig) {
ctx->user_ip4 = orig->addr;
ctx->user_port = orig->port;
}
}
return 1;
}
Helper functions
Supported helper functions
bpf_bind
bpf_cgrp_storage_delete
bpf_cgrp_storage_get
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_for_each_map_elem
bpf_get_cgroup_classid
bpf_get_current_ancestor_cgroup_id
bpf_get_current_cgroup_id
bpf_get_current_comm
bpf_get_current_pid_tgid
bpf_get_current_task
bpf_get_current_task_btf
bpf_get_current_uid_gid
bpf_get_local_storage
bpf_get_netns_cookie
bpf_get_ns_current_pid_tgid
v6.10bpf_get_numa_node_id
bpf_get_prandom_u32
bpf_get_smp_processor_id
bpf_get_socket_cookie
bpf_getsockopt
v5.8bpf_jiffies64
bpf_kptr_xchg
bpf_ktime_get_boot_ns
bpf_ktime_get_ns
bpf_ktime_get_tai_ns
bpf_loop
bpf_map_delete_elem
bpf_map_lookup_elem
bpf_map_lookup_percpu_elem
bpf_map_peek_elem
bpf_map_pop_elem
bpf_map_push_elem
bpf_map_update_elem
bpf_per_cpu_ptr
bpf_perf_event_output
bpf_probe_read_kernel
bpf_probe_read_kernel_str
bpf_probe_read_user
bpf_probe_read_user_str
bpf_ringbuf_discard
bpf_ringbuf_discard_dynptr
bpf_ringbuf_output
bpf_ringbuf_query
bpf_ringbuf_reserve
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit
bpf_ringbuf_submit_dynptr
bpf_setsockopt
v5.8bpf_sk_lookup_tcp
bpf_sk_lookup_udp
bpf_sk_release
bpf_sk_storage_delete
bpf_sk_storage_get
bpf_skc_lookup_tcp
bpf_snprintf
bpf_snprintf_btf
bpf_spin_lock
bpf_spin_unlock
bpf_strncmp
bpf_tail_call
bpf_task_pt_regs
bpf_this_cpu_ptr
bpf_timer_cancel
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_trace_printk
bpf_trace_vprintk
bpf_user_ringbuf_drain
KFuncs
Supported kfuncs
bpf_arena_alloc_pages
bpf_arena_free_pages
bpf_cast_to_kern_ctx
bpf_dynptr_adjust
bpf_dynptr_clone
bpf_dynptr_is_null
bpf_dynptr_is_rdonly
bpf_dynptr_size
bpf_dynptr_slice
bpf_dynptr_slice_rdwr
bpf_iter_bits_destroy
bpf_iter_bits_new
bpf_iter_bits_next
bpf_iter_css_destroy
bpf_iter_css_new
bpf_iter_css_next
bpf_iter_css_task_destroy
bpf_iter_css_task_new
bpf_iter_css_task_next
bpf_iter_num_destroy
bpf_iter_num_new
bpf_iter_num_next
bpf_iter_task_destroy
bpf_iter_task_new
bpf_iter_task_next
bpf_iter_task_vma_destroy
bpf_iter_task_vma_new
bpf_iter_task_vma_next
bpf_map_sum_elem_count
bpf_preempt_disable
bpf_preempt_enable
bpf_rcu_read_lock
bpf_rcu_read_unlock
bpf_rdonly_cast
bpf_sock_addr_set_sun_path
bpf_wq_init
bpf_wq_set_callback_impl
bpf_wq_start