* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000
Fix#2244
Follow-up to #2212#2213
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
/proc/cgroups is meaningless for v2 and should be ignored.
https://github.com/torvalds/linux/blob/v5.3/Documentation/admin-guide/cgroup-v2.rst#deprecated-v1-core-features
* Now GetAllSubsystems() parses /sys/fs/cgroup/cgroup.controller, not /proc/cgroups.
The function result also contains "pseudo" controllers: {"devices", "freezer"}.
As it is hard to detect availability of pseudo controllers, pseudo controllers
are always assumed to be available.
* Now IOGroupV2.Name() returns "io", not "blkio"
Fix#2155#2156
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.
subsystemsUnified is used on a system running with cgroups v2 unified
mode.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).
The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with small
amounts of ram (see
https://github.com/kubernetes/kubernetes/issues/77169) running a kernel
with config flag CONFIG_HUGETLBFS=y
As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.
Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"
And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"
Signed-off-by: Odin Ugedal <odin@ugedal.com>
The kernel will sometimes return EINVAL when writing a pid to a
cgroup.procs file. It does so when the task being added still has the
state TASK_NEW.
See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286
Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:
```
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "cgroup"
}
],
```
Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Respect the container's cgroup path when finding the container's
cgroup mount point, which is useful in multi-tenant environments, where
containers have their own unique cgroup mounts
Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Fix duplicate entries and missing entries in getCgroupMountsHelper
Add test for testing cgroup mounts on bedrock linux
Stop relying on number of subsystems for cgroups
LGTMs: @crosbymichael @cyphar
Closes#1817
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.
Without the 'all' flag, ignore duplicate subsystems after the first.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
The rootless cgroup manager acts as a noop for all set and apply
operations. It is just used for rootless setups. Currently this is far
too simple (we need to add opportunistic cgroup management), but is good
enough as a first-pass at a noop cgroup manager.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.
This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.
Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
The `bufio.Scanner.Scan` method returns false either by reaching the
end of the input or an error. After Scan returns false, the Err method
will return any error that occurred during scanning, except that if it
was io.EOF, Err will return nil.
We should check the error when Scan return false(out of the for loop).
Signed-off-by: Wang Long <long.wanglong@huawei.com>
Prior to this change a cgroup with a `:` character in it's path was not
parsed correctly (as occurs on some instances of systemd cgroups under
some versions of systemd, e.g. 225 with accounting).
This fixes that issue and adds a test.
Signed-off-by: Euan Kemp <euank@coreos.com>
`libcontainer/cgroups/utils.go` uses an incorrect path to the
documentation for cgroups. This updates the comment to use the correct
URL. Fixes#794.
Signed-off-by: Jim Berlage <james.berlage@gmail.com>
Avoid parsing the whole lines of mountinfo after all mountpoints
of the target subsytems are found, or when the target subsystem
is not enabled.
Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
No substantial code change.
Note that some style errors reported by `golint` are not fixed due to possible compatibility issues.
Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
Based on Golang document, %s is for "the uninterpreted bytes of the
string or slice", so %v is more appropriate.
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
GetMounts is very cpu-expensive. I'll change other funcs in this package
to reuse code from GetCgroupMounts later.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Bug was introduced in #250
According to: http://man7.org/linux/man-pages/man5/proc.5.html
36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11)
...
(7) optional fields: zero or more fields of the form
"tag[:value]".
The 7th field is optional. We should skip it when parsing mount info.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This is needed because for nested containers cgroups. Without this patch
they creating unnecessary intermediate cgroup like:
/sys/fs/cgroup/memory/system.slice/docker-9409d9f0b68fb9e9d7d532d5b3f35e7c7f9cca1312af392ae3b28436f1f2998f.scope/system.slice/docker-9409d9f0b68fb9e9d7d532d5b3f35e7c7f9cca1312af392ae3b28436f1f2998f.scope/docker/908ebcc9c13584a14322ec070bd971e0de62f126c0cd95c079acdb99990ad3a3
It is because in /proc/self/cgroup we see paths from host, and they don't
exist in container.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>