Program context __sk_buff
The socket buffer context is provided to program types that deal with network packets when there already is a socket buffer created/allocated. The struct __sk_buff
is a "mirror" of the struct sk_buff
program type which is actually used by the kernel.
Accesses to the struct __sk_buff
pointer are seamlessly transformed into accesses into the real socket buffer. This indirection exists to provide a stable ABI for programs since the struct sk_buff
may change between kernel versions and to provide a layer of checks. Not all program types are allowed to read and/or write to certain fields for a number of reasons.
Direct packet access
Fields
len
This field holds the total length of the packet. It is important to know that this doesn't indicate the amount of data that is available via direct packet access. In some cases the packet is larger than a single memory page, in which case the packet data lives in non-linear in which case the len
might be larger than data_end
-data
and specialized helpers are needed to access the rest of the memory.
pkt_type
This field indicates the type of the packet which informs "who" the packet is for. Possible values of this field are
the PACKET_*
values defined in include/uapi/linux/if_packet.h
.
PACKET_HOST
- indicates the packet is addresses to the MAC address of this hostPACKET_BROADCAST
- indicates the packet is addressed to a broadcast address.PACKET_MULTICAST
- indicates the packet is addressed to a multicast address.PACKET_OTHERHOST
- indicates the packet to addressed to some other host that it has been caught by a device driver in promiscuous modePACKET_OUTGOING
- indicates the packet originating from the local host that is looped back to a packet socket
Note
This is not an exhaustive list of possible values.
mark
This field is a general purpose 32 bit tag used in the network subsystem to carry metadata with global implications across network sub-subsystem. As an example, a driver could mark on incoming packet to be used by the ingress traffic control classifier-action sub-subsystem, netfilter, IPsec all to execute provisioned policies.1
queue_mapping
This field indicates via which TX queue on the NIC this packet should be sent. Typically this field is set by TC but can be overwritten by certain eBPF programs to implement custom balancing logic. 2
protocol
This field indicates the Layer 3 protocol of the packet and is one of the ETH_P_*
values defined in include/uapi/linux/if_ether.h
.
vlan_present
This field is a boolean 0
or 1
and indicates if the packet has a VLAN header.
vlan_tci
This field contains the VLAN TCI (Tag Control Information), if the packet included a VLAN header.
vlan_proto
This field contains the protocol ID of the used VLAN protocol which will be one of the ETH_P_*
values defined in include/uapi/linux/if_ether.h
.
priority
This field indicates the queuing priority of the packet. Packets with higher priority will be send out first. Only values between 0
and 63
are effective, values of 64
and above will be converted to 63
. This field only takes effect if the skbprio
queueing discipline has been configured in TC. 3
This only effects egress traffic since ingress traffic is never queued.
ingress_ifindex
This field contains the interface index of the network devices this packet arrived on. It may be 0
if a process on the host originated the packet.
ifindex
This field contains the interface index of the network device the packet is currently "on", so if a packet has been redirected to another device and a eBPF program is invoked on it again, this field should be updated to the new device.
On egress this will be the device picked for sending the packet.
tc_index
This field is used to carry Type of Service (TOS) information. This field is populated by the dsmark
qdisc and can subsequently be used with tcindex
filters to classify packets based on their TOS value.
The dsmark
uses the differentiated services (DS) fields in IPv4 (aka DSCP) and IPv6 (aka traffic class) headers.
BPF_PROG_TYPE_SCHED_CLS
programs can also modify this value to implement a custom TOS value extraction from packets.
cb
This field is an array of 5 u32 values with no pre-defined meaning. Network subsystems and eBPF programs can read from and write to this field to share information associated with the socket buffer across programs and subsystem boundaries.
hash
This field contains the calculated from the flow information of the packet. The fields used to calculate the hash can differ depending on the protocol. This hash is optionally calculated by network interface devices that support it. 4
tc_classid
This field can be used by BPF_PROG_TYPE_SCHED_CLS
in direct action mode to set the class id. This value is only useful if the program returns a TC_ACT_OK
and the qdisc has classes.
data
This field contains the pointer to the start address of the linear packet data. This will be the first byte of the layer 3 header the type of which is indicated by protocol
.
data_end
This field contains the pointer to the last address of the packet data linear packet data. This pointer is used in combination with data
to indicate accessible data.
napi_id
This field contains the id of the NAPI struct this socket buffer came from.
family
This field contains the address family of the socket associated this this socket buffer. Its value is one of AF_*
values defined in include/linux/socket.h
.
remote_ip4
The IPv4 address of the remote end of the socket.
local_ip4
The locally bound IPv4 address of the socket.
remote_ip6
The IPv6 address of the remote end of the socket.
local_ip6
The locally bound IPv6 address of the socket.
remote_port
The L4 port number of the remote side of the socket.
local_port
The L4 port number of the local side of the socket.
data_meta
This field contains a pointer to the start of a metadata region in the socket buffer. If no metadata room is set, so the value of data_meta
and data
will be the same. A XDP program can request metadata to be allocated with the bpf_xdp_adjust_meta
helper after which it can write arbitrary data into it.
If the packet with metadata is passed to the kernel, that metadata will be available in the __sk_buff
via this pointer. The region being between data_meta
and data
.
This means that XDP programs can communicate information to for example BPF_PROG_TYPE_SCHED_CLS
programs which can then manipulate the socket buffer to change __sk_buff->mark
or __sk_buff->priority
on behalf of an XDP program.
flow_keys
This field is a pointer to a struct bpf_flow_keys
which like the name implies hold the keys that identify the network flow of the socket buffer. This field is only accessible from within BPF_PROG_TYPE_FLOW_DISSECTOR
programs. More details can be found in its context section.
tstamp
This field indicates the time when this packet should be transmitted in nanoseconds since boot. BPF_PROG_TYPE_SCHED_CLS
programs can set this time to some time in the future to add delay to packets for the purposes of bandwidth limiting or simulating latency. Setting this value only works on egress if the fq
(Fair Queue) qdisc is used.
Note
The fq
qdisc has a "drop horizon" if packets are set to transmit to far into the future they will be dropped to avoid queueing to many packets.
Note
After v5.18 1 / 2 the meaning of this field can also be "received time" and the tstamp_type
field will indicate one or the other.
wire_len
This field contains the length of the data as it will appear on the wire.
gso_segs
This field indicates the number of GSO segments that are contained within the current socket buffer.
sk
This field is a pointer to a struct bpf_sock
which holds information about the socket associated with this socket buffer. More details can be found in the dedicated section
This field is always read-only.
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
gso_size
This field indicates the size of GSO segments that are contained within the current socket buffer.
tstamp_type
This field indicates what the meaning of tstamp
is. The field can have the following values:
BPF_SKB_TSTAMP_UNSPEC
- Thetstamp
field contains has the (rcv) tstamp at ingress and the delivery time at egress.BPF_SKB_TSTAMP_DELIVERY_MONO
- Thetstamp
field contains the requested to to deliver the packet, seetstamp
for details.
hwtstamp
This field contains the time the packet was received at as reported by the NIC if it supports this feature.
Socket
This section describes the fields of the struct bpf_sock
type which is a mirror of the kernels struct sock
type.
bound_dev_if
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
family
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
type
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
protocol
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
mark
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
priority
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
src_ip4
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
src_ip6
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
src_port
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
dst_port
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
dst_ip4
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
dst_ip6
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
state
Docs could be improved
This part of the docs is incomplete, contributions are very welcome
rx_queue_mapping
Docs could be improved
This part of the docs is incomplete, contributions are very welcome