Notes on eBPF

Prerequisites

Software

BCC's repo

References

"Learning eBPF" by Liz Rice

header file that defines some useful constants

kernel.org's document

eBPF's in-depth explanation

eBPF's portal website

BCC's document

Getting hands dirty

What is eBPF

It's a mechanism to let one dynamically load and run a piece of VERIFIED code within the kernel, without requiring to change kernel source code or load kernel modules.

Means that:

The eBPF programs are event-driven and run when a hook is triggered.

Frontends

A lot of eBPF frontends exists:

I choose BCC because I have learnt Python, and the examples from the book I use is using BCC.

BPF Maps

Some data structures share data between eBPF or userland programs.

Helper functions

Since eBPF programs are forbidden to call arbitrary kernel functions, the kernel provides several helper functions.

Helper functions can improve security and make the eBPF code kernel version agnostic.

First example

Source code

from bcc import BPF
prog = """
int hello(void *ctx) {
    bpf_trace_printk("Hello World!");
    return 0;
}
"""
b = BPF(text=prog)
syscall = b.get_syscall_fnname("execve")
b.attach_kprobe(event=syscall, fn_name="hello")
b.trace_print()

Explanation

Notes

This is for demonstrate the use of simple output.

The output will likely be out-of-order if we run a few eBPF codes that use bpf_trace_printk.

Second example

Source code

I omitted the same part.

C part

BPF_HASH(counter_table);
int hello(void *ctx) {
    u64 uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
    u64 *p = counter_table.lookup(&uid);
    u64 counter = 0;
    if (p != 0) {
        counter = *p;
    }
    counter++;
    counter_table.update(&uid, &counter);
    return 0;
}

Python part

while True:
     sleep(2)
     s = ""
     for k,v in b["counter_table"].items():
     s += f"ID {k.value}: {v.value}\t"
     print(s)

Explanation

C part

Notes

This is for showing the BPF maps.

Notice that the counter_table.lookup is not a valid C source code.

Here I post some of the maps.

See /usr/include/linux/bpf.h for the full list.

Generic maps list:

What is PERCPU?

per-CPU variants, which is to say that the kernel uses a different block of memory for each CPU core’s version of that map.

What are LPM and TRIE?

Longest prefix match.

Prefix tree.

This name is from the middle syllable of "retrieval".

Non-generic maps list:

Notice that BPF_HISTOGRAM is not implemented in the kernel but in BCC.

See this pull request that implements the BPF_HISTOGRAM

Third example

Source code

C part

BPF_PERF_OUTPUT(output);
struct data_t {
    int pid;
    int uid;
    char command[16];
    char message[12];
};
int hello(void *ctx) {
    struct data_t data = {};
    char message[12] = "Hello World";
    data.pid = bpf_get_current_pid_tgid() >> 32;
    data.uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
    bpf_get_current_comm(&data.command, sizeof(data.command));
    bpf_probe_read_kernel(&data.message, sizeof(data.message), message);
    output.perf_submit(ctx, &data, sizeof(data));
    return 0;
}

Python part

def print_event(cpu, data, size):
    data = b["output"].event(data)
    print(f"{data.pid} {data.uid} {data.command.decode()} " + \
    f"{data.message.decode()}")

b["output"].open_perf_buffer(print_event)
while True:
    b.perf_buffer_poll()

Explanation

BPF_PERF_OUTPUT(): creating a map for output. This is a better one comparing to printk.

perf_submit(): put data into the map.

Notes

The better way to output is using BPF_RINGBUF_OUTPUT.

See this doc

PF_PERF_OUTPUT is differ from BPF_PERF_ARRAY. They are not the same thing.

At this point, we have some fundamental knowledge in eBPF, specifically bcc.

A Step further: define multiple functions

Source code

Variant 1

from bcc import BPF
prog = """
static void world() {
    bpf_trace_printk("world");
}
int hello(void* ctx) {
    bpf_trace_printk("hello");
    world();
}
"""

b = BPF(text=prog)
syscall = b.get_syscall_fnname("execve")
b.attach_kprobe(event=syscall, fn_name="hello")
b.trace_print()

Variant 2

from bcc import BPF
import ctypes
prog = """
BPF_PROG_ARRAY(funcs, 300);
int world(void* ctx) {
    bpf_trace_printk("world");
    return 0;
}
int hello(void* ctx) {
    bpf_trace_printk("hello");
    funcs.call(ctx, 1);
    return 0;
}
"""

b = BPF(text=prog)
syscall = b.get_syscall_fnname("execve")
b.attach_kprobe(event=syscall, fn_name="hello")
world_fn = b.load_func("world", BPF.KPROBE)
prog_array = b.get_table("funcs")
prog_array[ctypes.c_int(1)] = ctypes.c_int(world_fn.fd)
b.trace_print()


You've reached the end of this page. And you may Go to index or visit my friends.
About me and contacts
Except where otherwise noted, this site is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License