Program type BPF_PROG_TYPE_SK_SKB
Socket SKB programs are called on L4 data streams to parse L7 messages and/or to determine if the L4/L7 messages should be allowed, blocked or redirected.
Usage
Socket SKB programs are attached to BPF_MAP_TYPE_SOCKMAP
or BPF_MAP_TYPE_SOCKHASH
maps and will be invoked when messages get received on the sockets which are part of the map the program is attached to. The exact purpose of the program differs depending on its attach type.
As BPF_SK_SKB_STREAM_PARSER
program
When this attach type is used the program acts as a stream parser. The idea behind a stream parser is to parse message based application layer protocols (OSI Layer 7) which are implemented on top of data streams such as TCP.
The job of the program is to parse the L7 data/packet and to tell the kernel how long the L7 message is. This will allow the kernel to combine multiple data stream packets and return complete L7 messages for every recv
instead of returning the TCP messages which might only contain part of the L7 message.
The return value is interpreted as follows:
>0
- indicates length of successfully parsed message0
- indicates more data must be received to parse the message-ESTRPIPE
- current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itselfother < 0
- Error in parsing, give control back to userspace assuming that synchronization is lost and the stream is unrecoverable (application expected to close TCP socket)
Note
Before v5.10 it was required to have a stream parser attached to a BPF_MAP_TYPE_SOCKMAP
if you wanted to use the stream verdict as well. On newer versions this is no longer required.
On the older kernels, a no-op program can be used to just return the length of the current skb to retain default behavior and pass verdict per TCP packet.
SEC("sk_skb/stream_parser")
int noop_parser(struct __sk_buff *skb)
{
return skb->len;
}
As BPF_SK_SKB_STREAM_VERDICT
program
When this attach type is used the program acts as a filter, comparable to TC or XDP programs. The program gets called for every message indicated by the parser (or TCP packet if no parser is specified) and returns a verdict.
The return value is interpreted as follows:
SK_PASS
- The message may pass to the socket or it has been redirected with a helper.SK_DROP
- The message should be dropped.
Unlike TC or XDP programs, there is no special redirect return code, helpers such as bpf_sk_redirect_map
will return SK_PASS
on success.
As BPF_SK_SKB_VERDICT
program
The non-stream verdict attach type is a replacement for the BPF_SK_SKB_STREAM_VERDICT
attach type. The program type has the same job and uses the same return values. The difference is that this the stream verdict variant only supports TCP data streams while BPF_SK_SKB_VERDICT
also supports UDP.
Context
Socket SKB programs are called by the kernel with a __sk_buff
context.
This program type isn't allowed to read from and write to all fields of the context since doing so might break assumptions in the kernel or because data isn't available at the point where the program is hooked into the kernel.
Context fields
Attachment
Socket SKB programs are attached to BPF_MAP_TYPE_SOCKMAP
or BPF_MAP_TYPE_SOCKHASH
using the BPF_PROG_ATTACH
syscall (bpf_prog_attach
libbpf function).
The programs should be loaded with the same expected attach type as used during the attaching.
Note
Before BPF_SK_SKB_STREAM_VERDICT
and BPF_SK_SKB_VERDICT
are mutually exclusive per map, only one or the other program type can be used.
Example
Example BPF program:
// Copyright Red Hat
SEC("sk_skb/stream_verdict")
int bpf_prog_verdict(struct __sk_buff *skb)
{
__u32 lport = skb->local_port;
__u32 idx = 0;
if (lport == 10000)
return bpf_sk_redirect_map(skb, &sock_map_rx, idx, 0);
return SK_PASS;
}
Example userspace loader code:
// Copyright Red Hat
int create_sample_sockmap(int sock, int parse_prog_fd, int verdict_prog_fd)
{
int index = 0;
int map, err;
map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int), sizeof(int), 1, NULL);
if (map < 0) {
fprintf(stderr, "Failed to create sockmap: %s\n", strerror(errno));
return -1;
}
err = bpf_prog_attach(parse_prog_fd, map, BPF_SK_SKB_STREAM_PARSER, 0);
if (err){
fprintf(stderr, "Failed to attach_parser_prog_to_map: %s\n", strerror(errno));
goto out;
}
err = bpf_prog_attach(verdict_prog_fd, map, BPF_SK_SKB_STREAM_VERDICT, 0);
if (err){
fprintf(stderr, "Failed to attach_verdict_prog_to_map: %s\n", strerror(errno));
goto out;
}
err = bpf_map_update_elem(map, &index, &sock, BPF_NOEXIST);
if (err) {
fprintf(stderr, "Failed to update sockmap: %s\n", strerror(errno));
goto out;
}
out:
close(map);
return err;
}
Helper functions
Supported helper functions
bpf_cgrp_storage_delete
bpf_cgrp_storage_get
bpf_dynptr_data
bpf_dynptr_from_mem
bpf_dynptr_read
bpf_dynptr_write
bpf_for_each_map_elem
bpf_get_current_task
bpf_get_current_task_btf
bpf_get_numa_node_id
bpf_get_prandom_u32
bpf_get_smp_processor_id
bpf_get_socket_cookie
bpf_get_socket_uid
bpf_jiffies64
bpf_kptr_xchg
bpf_ktime_get_boot_ns
bpf_ktime_get_coarse_ns
bpf_ktime_get_ns
bpf_ktime_get_tai_ns
bpf_loop
bpf_map_delete_elem
bpf_map_lookup_elem
bpf_map_lookup_percpu_elem
bpf_map_peek_elem
bpf_map_pop_elem
bpf_map_push_elem
bpf_map_update_elem
bpf_per_cpu_ptr
bpf_perf_event_output
bpf_probe_read_kernel
bpf_probe_read_kernel_str
bpf_probe_read_user
bpf_probe_read_user_str
bpf_ringbuf_discard
bpf_ringbuf_discard_dynptr
bpf_ringbuf_output
bpf_ringbuf_query
bpf_ringbuf_reserve
bpf_ringbuf_reserve_dynptr
bpf_ringbuf_submit
bpf_ringbuf_submit_dynptr
bpf_sk_lookup_tcp
bpf_sk_lookup_udp
bpf_sk_redirect_hash
bpf_sk_redirect_map
bpf_sk_release
bpf_skb_adjust_room
bpf_skb_change_head
bpf_skb_change_tail
bpf_skb_load_bytes
bpf_skb_pull_data
bpf_skb_store_bytes
bpf_skc_lookup_tcp
bpf_skc_to_tcp6_sock
bpf_skc_to_tcp_request_sock
bpf_skc_to_tcp_sock
bpf_skc_to_tcp_timewait_sock
bpf_skc_to_udp6_sock
bpf_skc_to_unix_sock
bpf_snprintf
bpf_snprintf_btf
bpf_spin_lock
bpf_spin_unlock
bpf_strncmp
bpf_tail_call
bpf_task_pt_regs
bpf_this_cpu_ptr
bpf_timer_cancel
bpf_timer_init
bpf_timer_set_callback
bpf_timer_start
bpf_trace_printk
bpf_trace_vprintk
bpf_user_ringbuf_drain
KFuncs
Supported kfuncs
bpf_arena_alloc_pages
bpf_arena_free_pages
bpf_cast_to_kern_ctx
bpf_dynptr_adjust
bpf_dynptr_clone
bpf_dynptr_from_skb
bpf_dynptr_is_null
bpf_dynptr_is_rdonly
bpf_dynptr_size
bpf_dynptr_slice
bpf_dynptr_slice_rdwr
bpf_iter_bits_destroy
bpf_iter_bits_new
bpf_iter_bits_next
bpf_iter_css_destroy
bpf_iter_css_new
bpf_iter_css_next
bpf_iter_css_task_destroy
bpf_iter_css_task_new
bpf_iter_css_task_next
bpf_iter_num_destroy
bpf_iter_num_new
bpf_iter_num_next
bpf_iter_task_destroy
bpf_iter_task_new
bpf_iter_task_next
bpf_iter_task_vma_destroy
bpf_iter_task_vma_new
bpf_iter_task_vma_next
bpf_map_sum_elem_count
bpf_preempt_disable
bpf_preempt_enable
bpf_rcu_read_lock
bpf_rcu_read_unlock
bpf_rdonly_cast
bpf_wq_init
bpf_wq_set_callback_impl
bpf_wq_start