The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).
The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with small
amounts of ram (see
https://github.com/kubernetes/kubernetes/issues/77169) running a kernel
with config flag CONFIG_HUGETLBFS=y
As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.
Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"
And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"
Signed-off-by: Odin Ugedal <odin@ugedal.com>
When runc is started as a `Type=notify` systemd service,
runc opens up its own listening socket inside the container
to act as a proxy between the container and systemd for passing
notify messages.
The domain socket that runc creates is only writeable by the user
running runc however, so if the container has a different UID/GID
then nothing inside the container will be able to write to the socket.
The fix is to change the permissions of the notify listener socket to 0777.
Signed-off-by: Joe Burianek <joe.burianek@pantheon.io>
This will permit us to extend the internals of systemd.Manager to include
further information about the system, such as whether cgroupv1, cgroupv2 or
both are in effect.
Furthermore, it allows a future refactor of moving more of UseSystemd() code
into the factory initialization function.
Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
Minor refactoring to use the filePair struct for both init sock and log pipe
Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
In the exception handling of initProcess.start(), we need to add the
missing IntelRdtManager.Destroy() handler in defer func.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Bump logrus so that we can use logrus.StandardLogger().Logf instead
Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
We discovered in umoci that setting a dummy type of "none" would result
in file-based bind-mounts no longer working properly, which is caused by
a restriction for when specconv will change the device type to "bind" to
work around rootfs_linux.go's ... issues.
However, bind-mounts don't have a type (and Linux will ignore any type
specifier you give it) because the type is copied from the source of the
bind-mount. So we should always overwrite it to avoid user confusion.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Refactor configuring logging into a reusable component
so that it can be nicely used in both main() and init process init()
Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com>
Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Co-authored-by: Claudia Beresford <cberesford@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
Add support for children processes logging (including nsexec).
A pipe is used to send logs from children to parent in JSON.
The JSON format used is the same used by logrus JSON formatted,
i.e. children process can use standard logrus APIs.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
On some machines when setting the SELinux key labels to "", we are seeing
failures that cause runc to fail. Even if SELinux is disabled.
This check will ignore callers calling SELinux Set*Label functions with ""
when SELinux is disabled.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Whenever processes are spawned using nsexec, a zombie runc:[1:CHILD]
process will always be created and will need to be reaped by the parent
Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>
The additional test shows as a separate job. It sets environment
RUNC_USE_SYSTEMD=1 so it will be clear in Travis-CI that this job is
testing the systemd cgroup driver.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
These tests sometimes hang, so let's skip them for now.
Tested:
$ sudo make localintegration TESTPATH='/checkpoint.bats' RUNC_USE_SYSTEMD=1
The 5 tests in this test suite will be skipped.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
When $RUNC_USE_SYSTEMD is set, then use a systemd syntax for the
cgroupsPath. Also fix $CGROUPS_PATH to look under the actual path to the
slice/scope created by systemd.
Tested:
$ sudo make localintegration TESTPATH='/cgroups.bats' RUNC_USE_SYSTEMD=1
That test will fail without this commit.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
This allows us to test runc using libcontainer's systemd driver, by
passing an extra `--systemd-cgroup` argument to the calls to runc.
Tested:
$ sudo make localintegration TESTPATH='/exec.bats' RUNC_USE_SYSTEMD=1
And confirmed that systemd was in use by looking at creation and removal
of libcontainer_<pid>_systemd_test_default.slice test slices. Also
introduced a breakage in systemd cgroup driver and confirmed that the
tests failed as expected.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
secure_getenv is a Glibc extension and so this code does not compile
on Musl libc any more after this patch.
secure_getenv is only intended to be used in setuid binaries, in
order that they should not trust their environment. It simply returns
NULL if the binary is running setuid. If runc was installed setuid,
the user can already do anything as root, so it is game over, so this
check is not needed.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>