Skip to content

Program type BPF_PROG_TYPE_SCHED_CLS

v4.1

This program type allows for the implementation of a Traffic Control (TC) classifier (aka filter) in eBPF. TC can be used for a number of use cases, all of them having to do with the manipulation of traffic. TC is for example used to implement QoS (Quality of Service) allowing latency sensitive traffic like VoIP (Voice over IP) to be processed ahead of lets say web traffic. It can also drop packets to simulate packet-loss, add latency to simulate distant clients or apply bandwidth limitations for applications or users, to name a few.

TC allows an admin to filter traffic using a hierarchical model of qdiscs (Queuing DISCipline). A root qdisc is attached to a network interface with certain actions. This qdisc can also have child qdiscs which will be used over the root if their filter matches the traffic. This program type allows us to implement such a filter in eBPF.

Usage

TC Classifier programs are typically put into an ELF section prefixed with tc/ or classifier/. The TC Classifier program is called by the kernel with a __sk_buff context. The return value indicates what action the kernel should take with the packet, the following values are permitted:

Regular classifier

By default, when a BPF classifier is attached to a qdisc it will act as any other classifier. It can't take actions such as dropping or redirecting packets, instead its return value is used to pick a class based on the contents of the packet. A return value of -1 indicates the default class should be picked, a return value of 0 means the filter did not match, and that the next filter should try, and any positive number indicates the id of the class.

While possible, this is a rarely used use-case, eBPF programs are typically used for direct actions.

Direct action

When attached in direct action mode, the eBPF program will act as both a classifier and an action. This mode simplifies setups for the most common use cases where we just want to always execute an action. In direct action mode the return value can be one of:

  • TC_ACT_UNSPEC (-1) - Signals that the default configured action should be taken.
  • TC_ACT_OK (0) - Signals that the packet should proceed.
  • TC_ACT_RECLASSIFY (1) - Signals that the packet has to re-start classification from the root qdisc. This is typically used after modifying the packet so its classification might have different results.
  • TC_ACT_SHOT (2) - Signals that the packet should be dropped, no other TC processing should happen.
  • TC_ACT_PIPE (3) - While defined, this action should not be used and holds no particular meaning for eBPF classifiers.
  • TC_ACT_STOLEN (4) - While defined, this action should not be used and holds no particular meaning for eBPF classifiers.
  • TC_ACT_QUEUED (5) - While defined, this action should not be used and holds no particular meaning for eBPF classifiers.
  • TC_ACT_REPEAT (6) - While defined, this action should not be used and holds no particular meaning for eBPF classifiers.
  • TC_ACT_REDIRECT (7) - Signals that the packet should be redirected, the details of how and where to are set as side effects by helpers functions.

Classifiers in direct action mode can still set a class id by setting the tc_classid field

Context

This program type is not allowed to read from and write to all fields of the context since doing so might break assumptions in the kernel or because data is not available at the point where the program is hooked into the kernel.

Context fields
Field Read Write
len
pkt_type
mark
queue_mapping
protocol
vlan_present
vlan_tci
vlan_proto
priority
ingress_ifindex
ifindex
tc_index
cb
hash
tc_classid
data
data_end
napi_id
family
remote_ip4
local_ip4
remote_ip4
remote_ip6
local_ip6
remote_port
local_port
data_meta
flow_keys
tstamp
wire_len
tstamp
gso_segs
sk
gso_size
tstamp_type
hwtstamp

Attachment

As of kernel version v6.2 the only way to attach eBPF programs to TC is via a netlink socket the details of which are complex. The usage of a netlink library is recommended if you wish to manage attachment via an API. However, the most common way to go about this is via the iproute2 tc CLI tool which is the standard implementation for network utilities using the netlink protocol.

The most basic example of attaching a TC classifier is:

# Add a qdisc of type `clsact` to device `eth1`
$ tc qdisc add dev eth1 clsact
# Load the `program.o` ELF file, and attach the `my_func` section to the qdisc of eth1 on the ingress side.
$ tc filter add dev eth1 ingress bpf obj program.o sec my_func

For more details on the tc command, see the general man page.

For more details on the bpf filter options, see the tc-bpf man page.

In addition, the kernel supports the tcx (the new tc BPF fast path with BPF link support) since kernel v6.6, which allows for more advanced features like attaching multiple programs to a single qdisc, or attaching programs to a qdisc on the egress side:

+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| Program Type                              | Attach Type                            | ELF Section Name                 | Sleepable |
+===========================================+========================================+==================================+===========+
| ``BPF_PROG_TYPE_SCHED_CLS``               |                                        | ``classifier`` [#tc_legacy]_     |           |
+                                           +                                        +----------------------------------+-----------+
|                                           |                                        | ``tc`` [#tc_legacy]_             |           |
+                                           +----------------------------------------+----------------------------------+-----------+
|                                           | ``BPF_NETKIT_PRIMARY``                 | ``netkit/primary``               |           |
+                                           +----------------------------------------+----------------------------------+-----------+
|                                           | ``BPF_NETKIT_PEER``                    | ``netkit/peer``                  |           |
+                                           +----------------------------------------+----------------------------------+-----------+
|                                           | ``BPF_TCX_INGRESS``                    | ``tc/ingress``                   |           |
+                                           +----------------------------------------+----------------------------------+-----------+
|                                           | ``BPF_TCX_EGRESS``                     | ``tc/egress``                    |           |
+                                           +----------------------------------------+----------------------------------+-----------+
|                                           | ``BPF_TCX_INGRESS``                    | ``tcx/ingress``                  |           |
+                                           +----------------------------------------+----------------------------------+-----------+
|                                           | ``BPF_TCX_EGRESS``                     | ``tcx/egress``                   |           |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+

The definition of return codes for tcx programs can be found in the kernel sources:

/* (Simplified) user return codes for tcx prog type.
 * A valid tcx program must return one of these defined values. All other
 * return codes are reserved for future use. Must remain compatible with
 * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
 * return codes are mapped to TCX_NEXT.
 */
enum tcx_action_base {
    TCX_NEXT    = -1,
    TCX_PASS    = 0,
    TCX_DROP    = 2,
    TCX_REDIRECT    = 7,
};

For more details of tcx, see the LSFMM+BPF Summit Recap and Video: Revamping Global Socket Iterator, Netkit and Next Steps.

Helper functions

Not all helper functions are available in all program types. These are the helper calls available for TC classifier programs:

Supported helper functions

KFuncs

Supported kfuncs