Program type BPF_PROG_TYPE_CGROUP_SYSCTL
cGroup sysctl programs are called when a process in the cGroup to which the program is attached attempts to read or write a sysctl option in the proc
file system.
Usage
cGroup sysctl programs are typically located in the cgroup/sysctl
ELF section. These programs can be used to inspect and filter sysctl usage.
These programs must return one of the following return codes:
0
means "reject access to sysctl"1
means "proceed with access"
If program returns 0
user space will get -1
from read(2)
or write(2)
and errno
will be set to EPERM
.
Note
BPF_PROG_TYPE_CGROUP_SYSCTL
is intended to be used in trusted root
environment, for example to monitor sysctl usage or catch unreasonable values
an application, running as root in a separate cgroup, is trying to set.
Since task_dfl_cgroup(current)
is called at sys_read
/ sys_write
time it
may return results different from that at sys_open
time, i.e. process that
opened sysctl file in proc filesystem may differ from process that is trying
to read from / write to it and two such processes may run in different
cGroups, what means BPF_PROG_TYPE_CGROUP_SYSCTL
should not be used as a
security mechanism to limit sysctl usage.
As with any cGroup-bpf program additional care should be taken if an application running as root in a cGroup should not be allowed to detach/replace BPF program attached by administrator.
Special helpers
Since sysctl knob is represented by a name and a value, sysctl specific BPF helpers focus on providing access to these properties:
-
bpf_sysctl_get_name
to get sysctl name as it is visible in/proc/sys
into provided by BPF program buffer; -
bpf_sysctl_get_current_value
to get string value currently held by sysctl into provided by BPF program buffer. This helper is available on bothread(2)
from andwrite(2)
to sysctl; -
bpf_sysctl_get_new_value
to get new string value currently being written to sysctl before actual write happens. This helper can be used only onctx->write == 1
; -
bpf_sysctl_set_new_value
to override new string value currently being written to sysctl before actual write happens. Sysctl value will be overridden starting from the currentctx->file_pos
. If the whole value has to be overridden BPF program can setfile_pos
to zero before calling to the helper. This helper can be used only onctx->write == 1
. New string value set by the helper is treated and verified by kernel same way as an equivalent string passed by user space.
BPF program sees sysctl value same way as user space does in proc
file system, i.e. as a string. Since many sysctl values represent an integer or a vector of integers, the following helpers can be used to get numeric value from the string:
bpf_strtol()
to convert initial part of the string to long integer similar to user spacestrtol(3)
bpf_strtoul()
to convert initial part of the string to unsigned long integer similar to user spacestrtoul(3)
Context
struct bpf_sysctl {
__u32 write;
__u32 file_pos;
};
write
This field indicates whether sysctl value is being read (0
) or written (1
). This field is read-only.
file_pos
This field indicates file position sysctl is being accessed at, read or written. This field is read-write. Writing to the field sets the starting position in sysctl proc
file read(2)
will be reading from or write(2)
will be writing to. Writing zero to the field can be used e.g. to override whole sysctl value by bpf_sysctl_set_new_value
on write(2)
even when it's called by user space on file_pos > 0
. Writing non-zero value to the field can be used to access part of sysctl value starting from specified file_pos
. Not all sysctl support access with file_pos != 0
, e.g. writes to numeric sysctl entries must always be at file position 0
. See also kernel.sysctl_writes_strict
sysctl.
Attachment
cGroup socket buffer programs are attached to cGroups via the BPF_PROG_ATTACH
syscall or via BPF link.
Example
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#include <stdint.h>
#include <string.h>
#include <linux/stddef.h>
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
/* Max supported length of a string with unsigned long in base 10 (pow2 - 1). */
#define MAX_ULONG_STR_LEN 0xF
/* Max supported length of sysctl value string (pow2). */
#define MAX_VALUE_STR_LEN 0x40
#ifndef ARRAY_SIZE
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#endif
const char tcp_mem_name[] = "net/ipv4/tcp_mem";
static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
{
unsigned char i;
char name[sizeof(tcp_mem_name)];
int ret;
memset(name, 0, sizeof(name));
ret = bpf_sysctl_get_name(ctx, name, sizeof(name), 0);
if (ret < 0 || ret != sizeof(tcp_mem_name) - 1)
return 0;
#pragma clang loop unroll(full)
for (i = 0; i < sizeof(tcp_mem_name); ++i)
if (name[i] != tcp_mem_name[i])
return 0;
return 1;
}
SEC("cgroup/sysctl")
int sysctl_tcp_mem(struct bpf_sysctl *ctx)
{
unsigned long tcp_mem[3] = {0, 0, 0};
char value[MAX_VALUE_STR_LEN];
unsigned char i, off = 0;
volatile int ret;
if (ctx->write)
return 0;
if (!is_tcp_mem(ctx))
return 0;
ret = bpf_sysctl_get_current_value(ctx, value, MAX_VALUE_STR_LEN);
if (ret < 0 || ret >= MAX_VALUE_STR_LEN)
return 0;
#pragma clang loop unroll(full)
for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) {
ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0,
tcp_mem + i);
if (ret <= 0 || ret > MAX_ULONG_STR_LEN)
return 0;
off += ret & MAX_ULONG_STR_LEN;
}
return tcp_mem[0] < tcp_mem[1] && tcp_mem[1] < tcp_mem[2];
}
char _license[] SEC("license") = "GPL";
Helper functions
Supported helper functions
bpf_cgrp_storage_delete
bpf_cgrp_storage_get
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_for_each_map_elem
bpf_get_current_cgroup_id
bpf_get_current_pid_tgid
v6.10bpf_get_current_task
bpf_get_current_task_btf
bpf_get_current_uid_gid
bpf_get_local_storage
bpf_get_ns_current_pid_tgid
v6.10bpf_get_numa_node_id
bpf_get_prandom_u32
bpf_get_retval
bpf_get_smp_processor_id
bpf_jiffies64
bpf_kptr_xchg
bpf_ktime_get_boot_ns
bpf_ktime_get_coarse_ns
bpf_ktime_get_ns
bpf_ktime_get_tai_ns
bpf_loop
bpf_map_delete_elem
bpf_map_lookup_elem
bpf_map_lookup_percpu_elem
bpf_map_peek_elem
bpf_map_pop_elem
bpf_map_push_elem
bpf_map_update_elem
bpf_per_cpu_ptr
bpf_perf_event_output
bpf_probe_read_kernel
bpf_probe_read_kernel_str
bpf_probe_read_user
bpf_probe_read_user_str
bpf_ringbuf_discard
bpf_ringbuf_discard_dynptr
bpf_ringbuf_output
bpf_ringbuf_query
bpf_ringbuf_reserve
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit
bpf_ringbuf_submit_dynptr
bpf_set_retval
bpf_snprintf
bpf_snprintf_btf
bpf_spin_lock
bpf_spin_unlock
bpf_strncmp
bpf_strtol
bpf_strtoul
bpf_sysctl_get_current_value
bpf_sysctl_get_name
bpf_sysctl_get_new_value
bpf_sysctl_set_new_value
bpf_tail_call
bpf_task_pt_regs
bpf_this_cpu_ptr
bpf_timer_cancel
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_trace_printk
bpf_trace_vprintk
bpf_user_ringbuf_drain
KFuncs
There are currently no kfuncs supported for this program type