Okay, this requires a bit of explanation.
The reason for this emulation is to allow us to have seamless updates of
the devices cgroup for running containers. This was triggered by several
users having issues where our initial writing of a deny-all rule (in all
cases) results in spurrious errors.
The obvious solution would be to just remove the deny-all rule, right?
Well, it turns out that runc doesn't actually control the deny-all rule
because all users of runc have explicitly specified their own deny-all
rule for many years. This appears to have been done to work around a bug
in runc (which this series has fixed in [1]) where we would actually act
as a black-list despite this being a violation of the OCI spec.
This means that not adding our own deny-all rule in the case of updates
won't solve the issue. However, it will also not solve the issue in
several other cases (the most notable being where a container is being
switched between default-permission modes).
So in order to handle all of these cases, a way of tracking the relevant
internal cgroup state (given a certain state of "cgroups.list" and a set
of rules to apply) is necessary. That is the purpose of DevicesEmulator.
Reading "devices.list" is quite important because that's the only way we
can tell if it's safe to skip the troublesome deny-all rules without
making potentially-dangerous assumptions about the container.
We also are currently bug-compatible with the devices cgroup (namely,
removing rules that don't exist or having superfluous rules all works as
with the in-kernel implementation). The only exception to this is that
we give an error if a user requests to revoke part of a wildcard
exception, because allowing such configurations could result in security
holes (cgroupv1 silently ignores such rules, meaning in white-list mode
that the access is still permitted).
[1]: b2bec9806f ("cgroup: devices: eradicate the Allow/Deny lists")
Signed-off-by: Aleksa Sarai <asarai@suse.de>