While all modern kernels (and I do mean _all_ of them -- this syscall
was added in 2.6.10 before git had begun development!) have support for
this syscall, LXC has a default seccomp profile that returns ENOSYS for
this syscall. For most syscalls this would be a deal-breaker, and our
use of session keyrings is security-based there are a few mitigating
factors that make this change not-completely-insane:
* We already have a flag that disables the use of session keyrings
(for older kernels that had system-wide keyring limits and so
on). So disabling it is not a new idea.
* While the primary justification of using session keys *is*
security-based, it's more of a security-by-obscurity protection.
The main defense keyrings have is VFS credentials -- which is
something that users already have better security tools for
(setuid(2) and user namespaces).
* Given the security justification you might argue that we
shouldn't silently ignore this. However, the only way for the
kernel to return -ENOSYS is either being ridiculously old (at
which point we wouldn't work anyway) or that there is a seccomp
profile in place blocking it.
Given that the seccomp profile (if malicious) could very easily
just return 0 or a silly return code (or something even more
clever with seccomp-bpf) and trick us without this patch, there
isn't much of a significant change in how much seccomp can trick
us with or without this patch.
Given all of that over-analysis, I'm pretty convinced there isn't a
security problem in this very specific case and it will help out the
ChromeOS folks by allowing Docker to run inside their LXC container
setup. I'd be happy to be proven wrong.
Ref: https://bugs.chromium.org/p/chromium/issues/detail?id=860565
Signed-off-by: Aleksa Sarai <asarai@suse.de>