Using GetPaths from cgroupv2 unified hierarchy code is deprecated
and this function will (hopefully) be removed.
Use GetUnifiedPath() for v2 case.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Tested that the EINTR is still being detected:
> $ go1.14 test -c # 1.14 is needed for EINTR to happen
> $ sudo ./fscommon.test
> INFO[0000] interrupted while writing 1063068 to /sys/fs/cgroup/memory/test-eint-89293785/memory.limit_in_bytes
> PASS
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When performing checkpoint or restore of cgroupv2 unified hierarchy,
there is no need to call getCgroupMounts() / cgroups.GetCgroupMounts()
as there's only a single mount in there.
This eliminates the last internal (i.e. runc) use case of
cgroups.GetCgroupMounts() for v2 unified. Unfortunately, there
are external ones (e.g. moby/moby) so we can't yet let it
return an error.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It turns out that ioutil.Readfile wraps the error in a *os.PathError.
Since we cannot guarantee compilation with golang >= v1.13, we are
manually unwrapping the error.
Signed-off-by: Kieron Browne <kbrowne@pivotal.io>
The Travis tests running on Fedora 31 with cgroup2 on Vagrant had the
CRIU parts disabled because of a couple of problems.
One problem was a bug in runc and CRIU handling that Andrei fixed.
In addition four patches from the upcoming CRIU 3.14 are needed for
minimal cgroup2 support (freezer and mounting of cgroup2). With Andrei's
fix and the CRIU cgroup2 support and the runc CRIU cgroup2 integration
it is now possible the checkpoint integration tests again on the Fedora
Vagrant cgroup2 based integration test.
To run CRIU based tests the modules of Fedora 31 (the test host system)
are mounted inside of the container used to test runc in the buster
based container with -v /lib/modules:/lib/modules.
Signed-off-by: Adrian Reber <areber@redhat.com>
The newest CRIU version supports freezer v2 and this tells runc
to use it if new enough or fall back to non-freezer based process
freezing on cgroup v2 system.
Signed-off-by: Adrian Reber <areber@redhat.com>
The line we are parsing looks like this
> flags : fpu vme de pse <...>
so look for "flags" as a prefix, not substring.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Err() method should be called after the Scan() loop, not inside it.
Found by
git grep -A3 -F '.Scan()' | grep Err
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Remove joinCgroupsV2() function, as its name and second parameter
are misleading. Use createCgroupsv2Path() directly, do not call
getv2Path() twice.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Function getSubsystemPath(), while works for v2 unified case, is
suboptimal, as it does a few unnecessary calls.
Add a simplified version of getSubsystemPath(), called getv2Path(),
and use it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This code is a copy-paste from cgroupv1 systemd code. Its aim
is to check whether a subsystem is available, and skip those
that are not.
In case v2 unified hierarchy is used, getSubsystemPath never
returns "not found" error, so calling it is useless.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Cgroup v1 kernel doc [1] says:
> We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
and cgroup v2 kernel documentation [2] says:
> - If a controller implements an absolute resource guarantee and/or
> limit, the interface files should be named "min" and "max"
> respectively. If a controller implements best effort resource
> guarantee and/or limit, the interface files should be named "low"
> and "high" respectively.
>
> In the above four control files, the special token "max" should be
> used to represent upward infinity for both reading and writing.
Allow -1 value to still be used for v2, converting it to "max"
where it makes sense to do so.
This fixes the following issue:
> runc update test_update --memory-swap -1:
> error while setting cgroup v2: [write /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/memory.swap.max: invalid argument
> failed to write "-1" to "/sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/memory.swap.max"
> github.com/opencontainers/runc/libcontainer/cgroups/fscommon.WriteFile
> /home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/fscommon/fscommon.go:21
> github.com/opencontainers/runc/libcontainer/cgroups/fs2.setMemory
> /home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/fs2/memory.go:20
> github.com/opencontainers/runc/libcontainer/cgroups/fs2.(*manager).Set
> /home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/fs2/fs2.go:175
> github.com/opencontainers/runc/libcontainer/cgroups/systemd.(*UnifiedManager).Set
> /home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/systemd/unified_hierarchy.go:290
> github.com/opencontainers/runc/libcontainer.(*linuxContainer).Set
> /home/kir/go/src/github.com/opencontainers/runc/libcontainer/container_linux.go:211
[1] linux/Documentation/admin-guide/cgroup-v1/memory.rst
[2] linux/Documentation/admin-guide/cgroup-v2.rst
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
To the best of my knowledge, it has been decided to drop the kernel
memory controller from the cgroupv2 hierarchy, so "kernel memory limits"
do not exist if we're using v2 unified.
So, we need to ignore kernel memory setting. This was already done in
non-systemd case (see commit 88e8350de), let's do the same for systemd.
This fixes the following error:
> container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"open /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/tasks: no such file or directory\""
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. rootfs is already validated to be kosher by (*ConfigValidator).rootfs()
2. mount points from /proc/self/mountinfo are absolute and clean, too
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Delete libcontainer/mount in favor of github.com/moby/sys/mountinfo,
which is fast mountinfo parser.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Return earlier if there is an error.
2. Do not use filepath.Split on every entry, use info.Name() instead.
3. Make readProcsFile() accept file name as an argument, to avoid
unnecessary file name and directory splitting and merging.
4. Skip on info.IsDir() -- this avoids an error when cgroup name is
set to "cgroup.procs".
This is still not very good since filepath.Walk() performs an unnecessary
stat(2) on every entry, but better than before.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
fmt.Sprintf is slow and is not needed here, string concatenation would
be sufficient. It is also redundant to convert []byte from string and
back, since `bytes` package now provides the same functions as `strings`.
Use Fields() instead of TrimSpace() and Split(), mainly for readability
(note Fields() is somewhat slower than Split() but here it doesn't
matter much).
Use Join() to prepend the plus signs.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Golang 1.14 introduces asynchronous preemption which results into
applications getting frequent EINTR (syscall interrupted) errors when
invoking slow syscalls, e.g. when writing to cgroup files.
As writing to cgroups is idempotent, it is safe to retry writing to the
file whenever the write syscall is interrupted.
Signed-off-by: Mario Nitchev <marionitchev@gmail.com>