Having map of per-subsystem paths in systemd unified cgroups
driver does not make sense and makes the code less readable.
To get rid of it, move the systemd v1-or-v2 init code to
libcontainer/factory_linux.go which already has a function
to deduce unified path out of paths map.
End result is much cleaner code. Besides, we no longer write pid
to the same cgroup file 7 times in Apply() like we did before.
While at it
- add `rootless` flag which is passed on to fs2 manager
- merge getv2Path() into GetUnifiedPath(), don't overwrite
path if it is set during initialization (on Load).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In this very case, the code is writing to cgroup2 filesystem,
and the file name is well known and can't possibly be a symlink.
So, using securejoin is redundant.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.
The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.
The bad news are
1. Currently cgroups tests are only working with
RUNC_USE_SYSTEMD=yes (addressed by #2299, #2305)
2. Tests in events.bats do not work (need cgroupv2
memory.events support)
3. Fedora 31 image is 6 months old (and has broken
container-selinux policy) so we need `dnf update`,
which adds ~5 min to test time.
[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]
[v4: update fedora packages, use a single dnf transation]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Add `cgroups_v1` and `cgroups_v2` options to `requires`.
2. Modify `check_cgroup_value` to be able to work with v2.
3. Split `test "update"` into two:
- (1) testing cgroupv1-only cpu shares and cfs
- (2) testing limits that are more or less common
between v1 and v2: memory/swap, pids, cpusets.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
... and add kmem-tcp to cgroups kmem test.
First, we already have a separate kmem test in cgroups.bats.
Second, making kmem a requirement leads to skipping all the other
test cases in the update.bats test.
Third, kmem limit is being removed from the kernel, so it makes sense
to handle it separately.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This comment was added by commit 6cd425be2b (Allow update rt_period_us
and rt_runtime_us, Nov 4 2016), and the test case was added by commit
51baedf3f3 (Add integration for update rt period and runtime,
Nov 28 2016), making the comment obsolete.
Remove it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Consolidate all the cgroup-related initialization code to
a single place, init_cgroup_paths(), so we can see which
variables are set.
2. Lazily call init_cgroup_paths() from all places that require it.
3. Don't set globals KMEM and RT_PERIOD.
4. Slightly clarlify variables naming:
- use OCI_CGROUPS_PATH for cgroupsPath in config.json
- use REL_CGROUPS_PATH for relative cgroups path
5. Do not hardcode the list of cgroup subsystems -- get it from
/proc/cgroup.
6. Preliminary support for cgroupv2 unified hierarchy (not yet working).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Consolidate two implementations of check_cgroup_value()
into one, putting it into helpers.
Remove the first parameter, deducing the variable to get
the path from by the parameter name.
This should help in future cgroupv2 support.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If container's config.json have `"terminal": true` setting in its
"process" section, runc exec assumes that stdin (fd 0) is a terminal
and tries to use it.
This leads to the following error in case stdin is not a terminal:
> ERRO[0000] exec failed: provided file is not a console
So, even if -t/--tty is not set, exec uses stdin as a terminal.
It does not help that urfave/cli v1 parser we use does not allow
to use `-t no` or `-t false`.
Since the settings in config.json is probably for the container run/start,
not for the auxiliary process started inside a container with exec, do
not use a setting from there, only treating stdin as a terminal in case
`-t` is explicitly given.
Tests that use runc exec with a terminal are amended with -t.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When the cgroupv2 fs driver is used without setting cgroupsPath,
it picks up a path from /proc/self/cgroup. On a host with systemd,
such a path can look like (examples from my machines):
- /user.slice/user-1000.slice/session-4.scope
- /user.slice/user-1000.slice/user@1000.service/gnome-launched-xfce4-terminal.desktop-4260.scope
- /user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
This cgroup already contains processes in it, which prevents to enable
controllers for a sub-cgroup (writing to cgroup.subtree_control fails
with EBUSY or EOPNOTSUPP).
Obviously, a parent cgroup (which does not contain tasks) should be used.
Fixes opencontainers/runc/issues/2298
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
`cgroup-v2` was marked `allow_failures` because of the flakiness of VirtualBox VM: dc7d0bf
The flakiness seems to have gone away since we switched from VirtualBox to QEMU/KVM and increased HW resources: b8eed86Close#2301
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Commit 6905b72154 treats all negative values as "max",
citing cgroup v1 compatibility as a reason. In fact, in
cgroup v1 only -1 is treated as "unlimited", and other
negative values usually calse an error.
Treat -1 as "max", pass other negative values as is
(the error will be returned from the kernel).
Fixes: 6905b72154
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The resources.MemorySwap field from OCI is memory+swap, while cgroupv2
has a separate swap limit, so subtract memory from the limit (and make
sure values are set and sane).
Make sure to set MemorySwapMax for systemd, too. Since systemd does not
have MemorySwapMax for cgroupv1, it is only needed for v2 driver.
[v2: return -1 on any negative value, add unit test]
[v3: treat any negative value other than -1 as error]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
otherwise the memoryStats and hugetlbStats maps are not initialized
and GetStats() segfaults when using them.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Make use of errors.Is() and errors.As() where appropriate to check
the underlying error. The biggest motivation is to simplify the code.
The feature requires go 1.13 but since merging #2256 we are already
not supporting go 1.12 (which is an unsupported release anyway).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using errors.Unwrap() is not the best thing to do, since it returns
nil in case of an error which was not wrapped. More to say,
errors package provides more elegant ways to check for underlying
errors, such as errors.As() and errors.Is().
This reverts commit f8e138855d, reversing
changes made to 6ca9d8e6da.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CI was failing on cgroup v2 because mockCgroupManager.GetUnifiedPath()
was returning an error.
Now the function returns the value of mockCgroupManager.unifiedPath,
but the value is currently not used in the tests.
Fix#2286
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>